728x90
반응형
Step 1. Go to https://www.kaggle.com/openfoodfacts/world-food-facts/data
Step 2. Download the dataset to your computer and unzip it.
Step 3. Use the tsv file and assign it to a dataframe called food
-> import pandas as pd
import numpy as np
food = pd.read_csv(‘en.openfoodfacts.org.products.tsv', sep='\t’)
Step 4. See the first 5 entries
-> food.head(5)
A :
Step 5. What is the number of observations in the dataset?
-> food.shape[0]
A : 356027
해설 : shape[0]으로 행의 갯수 shape[1]로 열의 갯수를 알 수 있다.
Step 6. What is the number of columns in the dataset?
-> food.shape[1]
A : 163
해설 : shape[0]으로 행의 갯수 shape[1]로 열의 갯수를 알 수 있다.
Step 7. Print the name of all the columns.
-> food.columns
A : Index(['code', 'url', 'creator', 'created_t', 'created_datetime',
'last_modified_t', 'last_modified_datetime', 'product_name',
'generic_name', 'quantity',
...
'fruits-vegetables-nuts_100g', 'fruits-vegetables-nuts-estimate_100g',
'collagen-meat-protein-ratio_100g', 'cocoa_100g', 'chlorophyl_100g',
'carbon-footprint_100g', 'nutrition-score-fr_100g',
'nutrition-score-uk_100g', 'glycemic-index_100g',
'water-hardness_100g'],
dtype='object', length=163)
Step 8. What is the name of 105th column?
-> food.columns[104]
A : '-glucose_100g'
해설 : 105번째 이기에 [104]로 불러옴 ( 파이썬은 0부터 시작 )
Step 9. What is the type of the observations of the 105th column?
-> food.dtypes['-glucose_100g']
A : dtype('float64')
Step 10. How is the dataset indexed?
-> food.index
A : RangeIndex(start=0, stop=356027, step=1)
Step 11. What is the product name of the 19th observation?
-> food.product_name[18]
A : 'Lotus Organic Brown Jasmine Rice’
해설 : product_name의 19번째 값을 알고 싶다. 파이썬은 0부터 시작하기에 [18]로하여 19번째 값을 가져온다.
728x90
반응형