[Pandas] Pandas01 - Occupation 풀이 — 참신러닝 (Fresh-Learning)

728x90

Step 1. Import the necessary libraries

-> import pandas as pd

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called users and use the 'user_id' as index

-> url ='https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user'

data = pd.read_csv(url,'|', index_col='user_id')

data.head()

A :

해설 : url을 보고 ‘|’로 분리되어 있는 것을 확인. Index_col을 통해 ‘user_id’를 index로 사용함

Step 4. See the first 25 entries

-> data.head(25)

Step 5. See the last 10 entries

-> data.tail(10)

Step 6. What is the number of observations in the dataset?

->data.shape[0]

#or

data.info()

Step 7. What is the number of columns in the dataset?

-> data.shape[1]

A : 4

해설 : col이 4개라는 것을 axis = 1로 알아냄

Step 8. Print the name of all the columns.

-> data.columns

A : Index(['age', 'gender', 'occupation', 'zip_code'], dtype='object’)

Step 9. How is the dataset indexed?

-> data.index

A : Int64Index([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,

...

            934, 935, 936, 937, 938, 939, 940, 941, 942, 943],

           dtype='int64', name='user_id', length=943)

Step 10. What is the data type of each column?

-> data.dtypes

A : age int64

gender        object

occupation    object

zip_code      object

dtype: object

Step 11. Print only the occupation column

-> data['occupation']

Step 12. How many different occupations there are in this dataset?

-> data['occupation'].value_counts().count()

#or

data.occupation.nunique()

A : 21

해설 : 중복되지 않게 elements의 갯수를 알려준다.

Step 13. What is the most frequent occupation?

-> ocu = data['occupation'].value_counts()

ocu.head()

A : student 196

other            105

educator          95

administrator     79

engineer          67

Name: occupation, dtype: int64

해설 : occupation만 꺼내어 value_counts()를 통해 빈도 수를 파악한다.

Step 14. Summarize the DataFrame.

-> data.describe()

A :

해설 : default 값으로 인해 numeric columns만 도출된다!

Step 15. Summarize all the columns

-> data.describe(include='all’)

A :

해설 : include=‘all’을 포함하여 다른 변수들의 describe도 불러옴

Step 16. Summarize only the occupation column

-> data.occupation.describe()

A : count 943

unique         21

top       student

freq          196

Name: occupation, dtype: object

해설 : occupation만 선택후 describe를 통해 볼 수 있다.

Step 17. What is the mean age of users?

->data.mean()['age']

#or

round(data.age.mean())

A : 34

해설 : round를 통해 소수점을 제한한다. 여러가지 방법으로 평균을 구할 수 있다.

Step 18. What is the age with least occurrence?

-> data.age.value_counts().tail(5)

A : 11 1

10    1

73    1

66    1

7     1

Name: age, dtype: int64

해설 : value_counts()를 통해 빈도 수를 파악하고 하위 5개의 항목만을 tail을 통해 불러왔다.