🐍Python/Pandas

[Pandas] Pandas05 - Fictitious Names 풀이

728x90
반응형

Fictitious Names

Introduction:

This time you will create a data again

Special thanks to Chris Albon for sharing the dataset and materials. All the credits to this exercise belongs to him.

In order to understand about it go here.

Step 1. Import the necessary libraries

In [1]:
import pandas as pd

Step 2. Create the 3 DataFrames based on the followin raw data

In [2]:
raw_data_1 = {
        'subject_id': ['1', '2', '3', '4', '5'],
        'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 
        'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}

raw_data_2 = {
        'subject_id': ['4', '5', '6', '7', '8'],
        'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 
        'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}

raw_data_3 = {
        'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],
        'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}

Step 3. Assign each to a variable called data1, data2, data3

In [5]:
data1 = pd.DataFrame(raw_data_1,columns=raw_data_1.keys())
data2 = pd.DataFrame(raw_data_2,columns=raw_data_2.keys())
data3 = pd.DataFrame(raw_data_3,columns=raw_data_3.keys())
In [7]:
data1.head()
Out[7]:
subject_idfirst_namelast_name
01AlexAnderson
12AmyAckerman
23AllenAli
34AliceAoni
45AyoungAtiches
In [8]:
data2.head()
Out[8]:
subject_idfirst_namelast_name
04BillyBonder
15BrianBlack
26BranBalwner
37BryceBrice
48BettyBtisan
In [9]:
data3.head()
Out[9]:
subject_idtest_id
0151
1215
2315
3461
4516

Step 4. Join the two dataframes along rows and assign all_data

In [17]:
all_data = pd.concat([data1,data2],axis=0)
all_data
Out[17]:
subject_idfirst_namelast_name
01AlexAnderson
12AmyAckerman
23AllenAli
34AliceAoni
45AyoungAtiches
04BillyBonder
15BrianBlack
26BranBalwner
37BryceBrice
48BettyBtisan

Step 5. Join the two dataframes along columns and assing to all_data_col

In [18]:
all_data_col = pd.concat([data1,data2],axis=1)
all_data_col
Out[18]:
subject_idfirst_namelast_namesubject_idfirst_namelast_name
01AlexAnderson4BillyBonder
12AmyAckerman5BrianBlack
23AllenAli6BranBalwner
34AliceAoni7BryceBrice
45AyoungAtiches8BettyBtisan

Step 6. Print data3

In [19]:
data3
Out[19]:
subject_idtest_id
0151
1215
2315
3461
4516
5714
6815
791
81061
91116

Step 7. Merge all_data and data3 along the subject_id value

In [21]:
pd.merge(all_data,data3,on='subject_id')
Out[21]:
subject_idfirst_namelast_nametest_id
01AlexAnderson51
12AmyAckerman15
23AllenAli15
34AliceAoni61
44BillyBonder61
55AyoungAtiches16
65BrianBlack16
77BryceBrice14
88BettyBtisan15

Step 8. Merge only the data that has the same 'subject_id' on both data1 and data2

In [24]:
pd.merge(data1,data2,how='inner',on='subject_id')
Out[24]:
subject_idfirst_name_xlast_name_xfirst_name_ylast_name_y
04AliceAoniBillyBonder
15AyoungAtichesBrianBlack

Step 9. Merge all values in data1 and data2, with matching records from both sides where available.

In [28]:
pd.merge(data1,data2,on='subject_id',how='outer')
Out[28]:
subject_idfirst_name_xlast_name_xfirst_name_ylast_name_y
01AlexAndersonNaNNaN
12AmyAckermanNaNNaN
23AllenAliNaNNaN
34AliceAoniBillyBonder
45AyoungAtichesBrianBlack
56NaNNaNBranBalwner
67NaNNaNBryceBrice
78NaNNaNBettyBtisan


728x90
반응형