[Pandas] Pandas01 - Chipotle 풀이 — 참신러닝 (Fresh-Learning)

728x90

Step 1. Import the necessary libraries

-> import pandas as pd
import numpy as np

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called chipo.

-> url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'
chipo = pd.read_csv(url,sep='\t')

‘\t’ 탭/indent를 의미함
url을 csv로 읽어온다.

Step 4. See the first 10 entries

-> chipo.head(10)

A :

Step 5. What is the number of observations in the dataset?

-> # Solution 1
chipo.shape[0]

shape(0)을 통해 axis=0을 선택했다.

# Solution 2
chipo.info()

A:

Step 6. What is the number of columns in the dataset?

-> chipo.shape[1]

A : 5

해설 : shape(1)을 통해 axis 1 을 선택하여 열의 갯수를 파악할 수 있음

Step 7. Print the name of all the columns

-> chipo.columns

A : Index(['order_id', 'quantity', 'item_name', 'choice_description',
'item_price'],
dtype='object’)

해설 : .columns를 붙여서 data의 colnames를 불러온다.

Step 8. How is the dataset indexed?

-> chipo.index

A : RangeIndex(start=0, stop=4622, step=1)

Step 9. Which was the most-ordered item?

-> c = chipo.groupby('item_name')
   c = c.sum()
   c = c.sort_values(['quantity'], ascending=False)
   c.head(1)

A:

해설 : ‘item_name’을 기준으로 groupby를 진행하여 c에 저장하였다. 그 후 c들을 합했다. 그 후 sort_values를 통해 quantity를 기준으로 내림차순(오름차순=F)으로 정렬한다. 그리고 첫 번째 데이터를 추출한다.

Step 10. For the most-ordered item, how many items were ordered?

-> c = chipo.groupby('item_name')
   c = c.sum()
   c = c.sort_values(['quantity'],ascending=False)
   c.head(1)

A :

해설 : 수량을 파악하기 위해서 위와 같이 실행한다.

Step 11. What was the most ordered item in the choice_description column?

-> c = chipo.groupby('choice_description').sum()
c = c.sort_values(['quantity'],ascending=False)
c.head(1)

A :

해설 : choice_description을 기준으로 통계량을 보고싶기에 groupby를 이용한다. 그리고 most ordered를 보기 위해 quantity로 정렬하여 파악한다.

Step 12. How many items were orderd in total?

-> chipo.quantity.sum()

A : 4972

해설 : chipo의 total quantity를 알아내기 위해 .sum( )을 이용하였다.

Step 13. Turn the item price into a float

Step 13.a. Check the item price type

-> chipo.item_price.dtype

A : dtype('O')

해설 : O는 objects를 의미한다.

Step 13.b. Create a lambda function and change the type of item price

-> dollarizer = lambda x: float(x[1:-1])
chipo.item_price = chipo.item_price.apply(dollarizer)

해설 : lambda를 통해 1행부터 끝까지 float으로 변경하는 함수를 새로 만들었다. 이를 .apply를 통해 적용시켰다.

Step 13.c. Check the item price type

-> chipo.item_price.dtype

A : dtype('float64’)

Step 14. How much was the revenue for the period in the dataset?

-> revenue = (chipo['quantity'] * chipo['item_price']).sum()
print("revenue was : $" + str(np.round(revenue,2)))

A : revenue was : $39237.02

해설 : Revenue를 계산하기 위해 chipo의 quantity와 item_price를 골라내 곱한 후 총 합을 구했다. Str( )은 문자열 형태로 객체를 변환하여 리턴하는 함수이다. np.round에서 2로 설정함으로써 소수점 2자리까지 표현했다.

Step 15. How many orders were made in the period?

-> c = chipo.order_id.value_counts().count()
print(c)

A : 1834

해설 : 객체가 가지고 있는 values의 수를 알려준다. Count(x)는 리스트 내에 x가 몇 개 있는지 조사하여 그 개수를 돌려주는 함수이다.

Step 16. What is the average revenue amount per order?

-> # Solution 1
   avg = revenue / c
   print("average was : $" + str(np.round(avg,2)))

A : average was : $21.39
해설 : 이전의 c(총 주문수)로 revenue(총 수익)를 나누어 평균을 구했다.

-> # Solution 2
   chipo['revenue'] = chipo['item_price'] * chipo['quantity']
   order_grouped = chipo.groupby(by=['order_id']).sum()
   order_grouped.mean()['revenue']

해설 : 이전의 변수를 없다고 생각하고, 새로 revenue를 구하고 groupby를 통해 id별로 나누고 총 합을 구했다. 그리고 .mean()을 통해 [‘revenue’]열을 지정하여 평균을 구했다.

-> # solution 3
chipo.groupby(by=['order_id']).sum().mean()['revenue']
한 줄로도 해결 가능하다!

Step 17. How many different items are sold

-> chipo.item_name.value_counts().count()

A : 50
해설 : 서로 다른 품목이 얼마나 팔렸는지 알아야하기 때문에 Item_name의 value를 count하였다.

728x90

Step 1. Import the necessary libraries
-> import pandas as pd import numpy as np
Step 2. Import the dataset from this address.
Step 3. Assign it to a variable called chipo.
-> url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'chipo = pd.read_csv(url,sep='\t')‘\t’ 탭/indent를 의미함url을 csv로 읽어온다.
Step 4. See the first 10 entries
-> chipo.head(10)A :
Step 5. What is the number of observations in the dataset?
-> # Solution 1chipo.shape[0]shape(0)을 통해 axis=0을 선택했다. # Solution 2chipo.info()A:
Step 6. What is the number of columns in the dataset?
-> chipo.shape[1]A : 5해설 : shape(1)을 통해 axis 1 을 선택하여 열의 갯수를 파악할 수 있음
Step 7. Print the name of all the columns
-> chipo.columnsA : Index(['order_id', 'quantity', 'item_name', 'choice_description', 'item_price'], dtype='object’)해설 : .columns를 붙여서 data의 colnames를 불러온다.
Step 8. How is the dataset indexed?
-> chipo.indexA : RangeIndex(start=0, stop=4622, step=1)
Step 9. Which was the most-ordered item?
-> c = chipo.groupby('item_name') c = c.sum() c = c.sort_values(['quantity'], ascending=False) c.head(1)A:해설 : ‘item_name’을 기준으로 groupby를 진행하여 c에 저장하였다. 그 후 c들을 합했다. 그 후 sort_values를 통해 quantity를 기준으로 내림차순(오름차순=F)으로 정렬한다. 그리고 첫 번째 데이터를 추출한다.
Step 10. For the most-ordered item, how many items were ordered?
-> c = chipo.groupby('item_name') c = c.sum() c = c.sort_values(['quantity'],ascending=False) c.head(1)A :해설 : 수량을 파악하기 위해서 위와 같이 실행한다.
Step 11. What was the most ordered item in the choice_description column?
-> c = chipo.groupby('choice_description').sum() c = c.sort_values(['quantity'],ascending=False) c.head(1)A :해설 : choice_description을 기준으로 통계량을 보고싶기에 groupby를 이용한다. 그리고 most ordered를 보기 위해 quantity로 정렬하여 파악한다.
Step 12. How many items were orderd in total?
-> chipo.quantity.sum()A : 4972해설 : chipo의 total quantity를 알아내기 위해 .sum( )을 이용하였다.
Step 13. Turn the item price into a float
-> chipo.item_price.dtypeA : dtype('O')해설 : O는 objects를 의미한다.
-> dollarizer = lambda x: float(x[1:-1]) chipo.item_price = chipo.item_price.apply(dollarizer)해설 : lambda를 통해 1행부터 끝까지 float으로 변경하는 함수를 새로 만들었다. 이를 .apply를 통해 적용시켰다.
-> chipo.item_price.dtypeA : dtype('float64’)
Step 14. How much was the revenue for the period in the dataset?
-> revenue = (chipo['quantity'] * chipo['item_price']).sum() print("revenue was : $" + str(np.round(revenue,2)))A : revenue was : $39237.02해설 : Revenue를 계산하기 위해 chipo의 quantity와 item_price를 골라내 곱한 후 총 합을 구했다. Str( )은 문자열 형태로 객체를 변환하여 리턴하는 함수이다. np.round에서 2로 설정함으로써 소수점 2자리까지 표현했다.
Step 15. How many orders were made in the period?
-> c = chipo.order_id.value_counts().count() print(c)A : 1834해설 : 객체가 가지고 있는 values의 수를 알려준다. Count(x)는 리스트 내에 x가 몇 개 있는지 조사하여 그 개수를 돌려주는 함수이다.
Step 16. What is the average revenue amount per order?
-> # Solution 1 avg = revenue / c print("average was : $" + str(np.round(avg,2)))A : average was : $21.39해설 : 이전의 c(총 주문수)로 revenue(총 수익)를 나누어 평균을 구했다. -> # Solution 2 chipo['revenue'] = chipo['item_price'] * chipo['quantity'] order_grouped = chipo.groupby(by=['order_id']).sum() order_grouped.mean()['revenue']해설 : 이전의 변수를 없다고 생각하고, 새로 revenue를 구하고 groupby를 통해 id별로 나누고 총 합을 구했다. 그리고 .mean()을 통해 [‘revenue’]열을 지정하여 평균을 구했다. -> # solution 3chipo.groupby(by=['order_id']).sum().mean()['revenue']한 줄로도 해결 가능하다!
Step 17. How many different items are sold
-> chipo.item_name.value_counts().count()A : 50해설 : 서로 다른 품목이 얼마나 팔렸는지 알아야하기 때문에 Item_name의 value를 count하였다.

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Step 2. Import the dataset from this address.

-> url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'chipo = pd.read_csv(url,sep='\t')‘\t’ 탭/indent를 의미함url을 csv로 읽어온다.

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

-> url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'
chipo = pd.read_csv(url,sep='\t')

‘\t’ 탭/indent를 의미함
url을 csv로 읽어온다.