[Pandas] pandas01 - World Food Facts 풀이 — 참신러닝 (Fresh-Learning)

728x90

Step 1. Go to https://www.kaggle.com/openfoodfacts/world-food-facts/data

Step 2. Download the dataset to your computer and unzip it.

Step 3. Use the tsv file and assign it to a dataframe called food

-> import pandas as pd

import numpy as np

food = pd.read_csv(‘en.openfoodfacts.org.products.tsv', sep='\t’)

Step 4. See the first 5 entries

-> food.head(5)

A :

Step 5. What is the number of observations in the dataset?

-> food.shape[0]

A : 356027

해설 : shape[0]으로 행의 갯수 shape[1]로 열의 갯수를 알 수 있다.

Step 6. What is the number of columns in the dataset?

-> food.shape[1]

A : 163

해설 : shape[0]으로 행의 갯수 shape[1]로 열의 갯수를 알 수 있다.

Step 7. Print the name of all the columns.

-> food.columns

A : Index(['code', 'url', 'creator', 'created_t', 'created_datetime',

       'last_modified_t', 'last_modified_datetime', 'product_name',

       'generic_name', 'quantity',

...

       'fruits-vegetables-nuts_100g', 'fruits-vegetables-nuts-estimate_100g',

       'collagen-meat-protein-ratio_100g', 'cocoa_100g', 'chlorophyl_100g',

       'carbon-footprint_100g', 'nutrition-score-fr_100g',

       'nutrition-score-uk_100g', 'glycemic-index_100g',

       'water-hardness_100g'],

      dtype='object', length=163)

Step 8. What is the name of 105th column?

-> food.columns[104]

A : '-glucose_100g'

해설 : 105번째 이기에 [104]로 불러옴 ( 파이썬은 0부터 시작 )

Step 9. What is the type of the observations of the 105th column?

-> food.dtypes['-glucose_100g']

A : dtype('float64')

Step 10. How is the dataset indexed?

-> food.index

A : RangeIndex(start=0, stop=356027, step=1)

Step 11. What is the product name of the 19th observation?

-> food.product_name[18]

A : 'Lotus Organic Brown Jasmine Rice’

해설 : product_name의 19번째 값을 알고 싶다. 파이썬은 0부터 시작하기에 [18]로하여 19번째 값을 가져온다.