🐍Python/Pandas

[Pandas] Pandas04 - Student Alcohol Consumption 풀이

728x90
반응형

Student Alcohol Consumption

Introduction:

This time you will download a dataset from the UCI.

Step 1. Import the necessary libraries

In [12]:
import pandas as pd
import numpy as np

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called df.

In [3]:
url = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/04_Apply/Students_Alcohol_Consumption/student-mat.csv'
df = pd.read_csv(url,sep=',')
df.head()
Out[3]:
schoolsexageaddressfamsizePstatusMeduFeduMjobFjob...famrelfreetimegooutDalcWalchealthabsencesG1G2G3
0GPF18UGT3A44at_hometeacher...4341136566
1GPF17UGT3T11at_homeother...5331134556
2GPF15ULE3T11at_homeother...432233107810
3GPF15UGT3T42healthservices...3221152151415
4GPF16UGT3T33otherother...432125461010

5 rows × 33 columns

Step 4. For the purpose of this exercise slice the dataframe from 'school' until the 'guardian' column

In [7]:
# loc을 사용해서 col의 name으로 slicing
df2 = df.loc[:,'school':'guardian']
df2[:5]
Out[7]:
schoolsexageaddressfamsizePstatusMeduFeduMjobFjobreasonguardian
0GPF18UGT3A44at_hometeachercoursemother
1GPF17UGT3T11at_homeothercoursefather
2GPF15ULE3T11at_homeotherothermother
3GPF15UGT3T42healthserviceshomemother
4GPF16UGT3T33otherotherhomefather

Step 5. Create a lambda function that capitalize strings.

In [14]:
Cap = lambda x:x.capitalize()

Step 6. Capitalize both Mjob and Fjob

In [16]:
df2["Mjob"].apply(Cap)
df2['Fjob'].apply(Cap)
Out[16]:
0       Teacher
1         Other
2         Other
3      Services
4         Other
5         Other
6         Other
7       Teacher
8         Other
9         Other
10       Health
11        Other
12     Services
13        Other
14        Other
15        Other
16     Services
17        Other
18     Services
19        Other
20        Other
21       Health
22        Other
23        Other
24       Health
25     Services
26        Other
27     Services
28        Other
29      Teacher
         ...   
365       Other
366    Services
367    Services
368    Services
369     Teacher
370    Services
371    Services
372     At_home
373       Other
374       Other
375       Other
376       Other
377    Services
378       Other
379       Other
380     Teacher
381       Other
382    Services
383    Services
384       Other
385       Other
386     At_home
387       Other
388    Services
389       Other
390    Services
391    Services
392       Other
393       Other
394     At_home
Name: Fjob, Length: 395, dtype: object

Step 7. Print the last elements of the data set.

In [22]:
df2.tail()
Out[22]:
schoolsexageaddressfamsizePstatusMeduFeduMjobFjobreasonguardian
390MSM20ULE3A22servicesservicescourseother
391MSM17ULE3T31servicesservicescoursemother
392MSM21RGT3T11otherothercourseother
393MSM18RLE3T32servicesothercoursemother
394MSM19ULE3T11otherat_homecoursefather

Step 8. Did you notice the original dataframe is still lowercase? Why is that? Fix it and capitalize Mjob and Fjob.

In [24]:
df2["Mjob"] = df2["Mjob"].apply(Cap)
df2['Fjob'] = df2['Fjob'].apply(Cap)
df2.tail()
Out[24]:
schoolsexageaddressfamsizePstatusMeduFeduMjobFjobreasonguardian
390MSM20ULE3A22ServicesServicescourseother
391MSM17ULE3T31ServicesServicescoursemother
392MSM21RGT3T11OtherOthercourseother
393MSM18RLE3T32ServicesOthercoursemother
394MSM19ULE3T11OtherAt_homecoursefather
In [25]:
def majority(x):
    if x > 17:
        return True
    else:
        return False
    
In [27]:
df2['legal_drinker'] = df2['age'].apply(majority)
df2[:5]
Out[27]:
schoolsexageaddressfamsizePstatusMeduFeduMjobFjobreasonguardianlegal_drinker
0GPF18UGT3A44At_homeTeachercoursemotherTrue
1GPF17UGT3T11At_homeOthercoursefatherFalse
2GPF15ULE3T11At_homeOtherothermotherFalse
3GPF15UGT3T42HealthServiceshomemotherFalse
4GPF16UGT3T33OtherOtherhomefatherFalse

Step 10. Multiply every number of the dataset by 10.

I know this makes no sense, don't forget it is just an exercise
In [33]:
def multiple10(x):
    if type(x) == int:
        return x * 10
    return x
In [36]:
# applymap : Apply a function to a Dataframe elementwise.
# 각각 연산해주기 위해서 applymap 이용
df2.applymap(multiple10).head()
Out[36]:
schoolsexageaddressfamsizePstatusMeduFeduMjobFjobreasonguardianlegal_drinker
0GPF180UGT3A4040At_homeTeachercoursemotherTrue
1GPF170UGT3T1010At_homeOthercoursefatherFalse
2GPF150ULE3T1010At_homeOtherothermotherFalse
3GPF150UGT3T4020HealthServiceshomemotherFalse
4GPF160UGT3T3030OtherOtherhomefatherFalse


728x90
반응형