🐍Python/Pandas

[Pandas] Pandas05 - MPG Cars 풀이

728x90
반응형

MPG Cars

Introduction:

The following exercise utilizes data from UC Irvine Machine Learning Repository

Step 1. Import the necessary libraries

In [1]:
import pandas as pd

Step 2. Import the first dataset cars1 and cars2.

Step 3. Assign each to a variable called cars1 and cars2

In [3]:
url1 = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv'
url2 = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv'
cars1 = pd.read_csv(url1,sep=',')
cars2 = pd.read_csv(url2,sep=',')

Step 4. Ops it seems our first dataset has some unnamed blank columns, fix cars1

In [12]:
cars1 = cars1.loc[:,'mpg':'car']
cars1.head()
Out[12]:
mpgcylindersdisplacementhorsepowerweightaccelerationmodelorigincar
018.08307130350412.0701chevrolet chevelle malibu
115.08350165369311.5701buick skylark 320
218.08318150343611.0701plymouth satellite
316.08304150343312.0701amc rebel sst
417.08302140344910.5701ford torino

Step 5. What is the number of observations in each dataset?

In [15]:
print(cars1.shape)
print(cars2.shape)
(198, 9)
(200, 9)

Step 6. Join cars1 and cars2 into a single DataFrame called cars

In [24]:
cars = cars1.append(cars2)
print(cars.head())
print(cars.shape)
    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  18.0          8           307        130    3504          12.0     70   
1  15.0          8           350        165    3693          11.5     70   
2  18.0          8           318        150    3436          11.0     70   
3  16.0          8           304        150    3433          12.0     70   
4  17.0          8           302        140    3449          10.5     70   

   origin                        car  
0       1  chevrolet chevelle malibu  
1       1          buick skylark 320  
2       1         plymouth satellite  
3       1              amc rebel sst  
4       1                ford torino  
(398, 9)

Step 7. Ops there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

In [27]:
import numpy as np
owners = np.random.randint(15000,73000,cars.shape[0])
owners
Out[27]:
array([23805, 26449, 63463, 31684, 61786, 55975, 33774, 35569, 34714,
       34326, 38761, 39007, 40016, 53900, 36805, 19993, 42411, 64188,
       64041, 57594, 67190, 26414, 61335, 41821, 72970, 15199, 40389,
       69484, 35780, 26558, 65524, 42231, 55656, 23019, 62321, 56314,
       42742, 69615, 58416, 64664, 18957, 17433, 31943, 47821, 21030,
       63236, 51738, 34948, 42430, 62020, 37931, 51004, 46516, 72249,
       22779, 18150, 44212, 53893, 29802, 22388, 15480, 57391, 38611,
       66620, 45451, 52103, 21934, 72585, 22673, 42021, 52011, 22871,
       42979, 22083, 47473, 17001, 43465, 72559, 17748, 44711, 46561,
       41062, 61448, 32022, 26268, 70686, 71284, 38901, 28124, 66441,
       48180, 29272, 18288, 58750, 25189, 53445, 70468, 16142, 69106,
       30508, 60473, 72966, 19197, 21075, 44054, 30928, 23562, 66012,
       21642, 54089, 59792, 59718, 29806, 48669, 42598, 36843, 34687,
       48432, 37899, 54807, 57798, 43546, 59116, 60564, 17245, 43564,
       72512, 43826, 18913, 28083, 30135, 53227, 38063, 47474, 20453,
       33996, 44035, 39598, 69705, 55372, 42683, 29496, 23389, 37616,
       37570, 51595, 49600, 68981, 56480, 25961, 56979, 49942, 66849,
       47327, 21736, 72428, 68801, 28693, 65939, 28267, 65694, 15633,
       69166, 60757, 42798, 18672, 30130, 35433, 45196, 48361, 72353,
       48443, 64474, 20531, 22104, 26222, 65363, 72027, 69713, 48726,
       56723, 35066, 71317, 39982, 15778, 25621, 67077, 71188, 43843,
       67755, 65333, 26312, 48310, 51404, 49023, 15076, 49129, 17958,
       69980, 54072, 62893, 71106, 39439, 48177, 57617, 41966, 29419,
       27290, 37508, 23619, 28850, 20623, 44810, 26295, 58579, 67535,
       46105, 20145, 66031, 34385, 50030, 49913, 21971, 63360, 58127,
       23682, 32792, 41330, 27098, 48138, 56050, 48423, 21013, 53982,
       39661, 70801, 59364, 67717, 45793, 18465, 62252, 67054, 44597,
       69250, 67100, 20512, 66124, 67225, 64670, 66820, 63957, 63587,
       17432, 41576, 58461, 37608, 58709, 53218, 43750, 45045, 72714,
       45119, 24389, 68460, 52920, 72832, 70326, 53569, 37214, 40737,
       42228, 56968, 26560, 40520, 57392, 18199, 30537, 70452, 21999,
       40601, 69183, 68309, 46055, 63966, 61867, 15198, 21870, 40652,
       44014, 29016, 16568, 65967, 50783, 43567, 45608, 65039, 39190,
       61664, 49447, 43372, 49527, 56825, 55697, 27187, 15647, 52713,
       39670, 59890, 50609, 26313, 17525, 38277, 22178, 37923, 17807,
       55102, 28739, 36467, 31201, 58401, 59083, 52204, 31195, 63222,
       50719, 63272, 26391, 19372, 40584, 35380, 49157, 46868, 52123,
       65203, 43083, 25831, 24946, 37386, 24825, 44213, 50167, 48569,
       55045, 36294, 32223, 59465, 40142, 51892, 62434, 21145, 19236,
       55767, 66305, 16727, 20198, 19217, 28835, 38396, 29323, 64440,
       47785, 56356, 29295, 40947, 19935, 58569, 39932, 48147, 59815,
       66734, 60486, 46154, 54975, 48249, 68982, 62051, 37830, 17407,
       34240, 21637, 45263, 27690, 69361, 67404, 52110, 53316, 51312,
       16894, 51185, 57893, 57485, 53175, 37090, 62758, 59182, 36659,
       16757, 18275])

Step 8. Add the column owners to cars

In [29]:
cars['owners'] = owners
cars.head()
Out[29]:
mpgcylindersdisplacementhorsepowerweightaccelerationmodelorigincarowners
018.08307130350412.0701chevrolet chevelle malibu23805
115.08350165369311.5701buick skylark 32026449
218.08318150343611.0701plymouth satellite63463
316.08304150343312.0701amc rebel sst31684
417.08302140344910.5701ford torino61786


728x90
반응형