[Pandas] Pandas02 - Chipotle 풀이 — 참신러닝 (Fresh-Learning)

728x90

Step 1. Import the necessary libraries

-> import pandas as pd

import numpy as np

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called chipo.

-> url = "https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv"

chipo = pd.read_csv(url,sep='\t')

Step 4. How many products cost more than $10.00?

-> prices = [float(value[1 : -1]) for value in chipo.item_price]

# reassign the column with the cleaned prices

chipo.item_price = prices

# delete the duplicates in item_name and quantity

chipo_filtered = chipo.drop_duplicates(['item_name','quantity'])

# select only the products with quantity equals to 1

chipo_one_prod = chipo_filtered[chipo_filtered.quantity == 1]

chipo_one_prod[chipo_one_prod['item_price']>10].item_name.nunique()

A : 12

해설 : 먼저 chipo의 item_price들을 $을 뺀 float으로 만들어준다. 그리고 중복되는 item_name, quantity를 제거하고 quantity==1인 데이터만 필터링한다. 그리고 가격이 10이상인 것들을 추출하고 nunique로 중복하지 않게 갯수를 알아낸다.

Step 5. What is the price of each item?

print a data frame with only two columns item_name and item_price

-> chipo.reindex(columns=['item_name','item_price’])

A :

해설 : 답안과는 다르지만 두 가지 colums만을 추출하는 것은 reindex를 통해 할 수 있다. 원하는 column name을 지정하여 새로운 dataframe을 만든다.

Step 6. Sort by the name of the item

-> chipo.item_name.sort_values()

#or

chipo.sort_values(by='item_name')

A :

해설 : sort_values( )를 통해 분류할 수 있다.

Step 7. What was the quantity of the most expensive item ordered?

-> chipo.sort_values(by='item_price',ascending=False).head(1)

A :

해설 : sort_values()로 item_price별로 내림차순으로 구분한다. 그리고 가장 첫 번째를 인덱싱한다.

Step 8. How many times were a Veggie Salad Bowl ordered?

-> salad = chipo[chipo.item_name == 'Veggie Salad Bowl']

len(salad)

A : 18

해설 : chip.item_name이 Veggie Salad Bowl인 것만을 골라 salad라는 변수에 할당한다. 그리고 salad의 길이를 측정하여 times를 알 수 있다.

Step 9. How many times people orderd more than one Canned Soda

-> soda = chipo[(chipo.item_name== 'Canned Soda') & (chipo.quantity > 1)]

len(soda)

A : 20

해설 : chipo를 두 가지 조건으로 filtering한다. name이 Canned Soda인 것 중에 quantity가 1을 초과하는 것을 soda에 할당한다. 그리고 step 8 과 같이 len으로 그 수를 확인한다.