🐍Python/Pandas

[Pandas] Pandas04 - US Crime Rates 풀이

728x90
반응형

United States - Crime Rates - 1960 - 2014

Introduction:

This time you will create a data

Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

Step 1. Import the necessary libraries

In [1]:
import pandas as pd

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called crime.

In [2]:
url = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/04_Apply/US_Crime_Rates/US_Crime_Rates_1960_2014.csv'
crime = pd.read_csv(url,sep=',')
crime.head()
Out[2]:
YearPopulationTotalViolentPropertyMurderForcible_RapeRobberyAggravated_assaultBurglaryLarceny_TheftVehicle_Theft
01960179323175338420028846030957009110171901078401543209121001855400328200
11961182992000348800028939031986008740172201066701567609496001913000336000
21962185771000375220030151034507008530175501108601645709943002089600366800
319631884830004109500316970379250086401765011647017421010864002297800408300
419641911410004564600364220420040093602142013039020305012132002514400472800

Step 4. What is the type of the columns?

In [3]:
crime.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 12 columns):
Year                  55 non-null int64
Population            55 non-null int64
Total                 55 non-null int64
Violent               55 non-null int64
Property              55 non-null int64
Murder                55 non-null int64
Forcible_Rape         55 non-null int64
Robbery               55 non-null int64
Aggravated_assault    55 non-null int64
Burglary              55 non-null int64
Larceny_Theft         55 non-null int64
Vehicle_Theft         55 non-null int64
dtypes: int64(12)
memory usage: 5.2 KB
Have you noticed that the type of Year is int64. But pandas has a different type to work with Time Series. Let's see it now.

Step 5. Convert the type of the column Year to datetime64

In [4]:
crime.Year = pd.to_datetime(crime.Year,format='%Y')
crime.Year
Out[4]:
0    1960-01-01
1    1961-01-01
2    1962-01-01
3    1963-01-01
4    1964-01-01
5    1965-01-01
6    1966-01-01
7    1967-01-01
8    1968-01-01
9    1969-01-01
10   1970-01-01
11   1971-01-01
12   1972-01-01
13   1973-01-01
14   1974-01-01
15   1975-01-01
16   1976-01-01
17   1977-01-01
18   1978-01-01
19   1979-01-01
20   1980-01-01
21   1981-01-01
22   1982-01-01
23   1983-01-01
24   1984-01-01
25   1985-01-01
26   1986-01-01
27   1987-01-01
28   1988-01-01
29   1989-01-01
30   1990-01-01
31   1991-01-01
32   1992-01-01
33   1993-01-01
34   1994-01-01
35   1995-01-01
36   1996-01-01
37   1997-01-01
38   1998-01-01
39   1999-01-01
40   2000-01-01
41   2001-01-01
42   2002-01-01
43   2003-01-01
44   2004-01-01
45   2005-01-01
46   2006-01-01
47   2007-01-01
48   2008-01-01
49   2009-01-01
50   2010-01-01
51   2011-01-01
52   2012-01-01
53   2013-01-01
54   2014-01-01
Name: Year, dtype: datetime64[ns]

Step 6. Set the Year column as the index of the dataframe

In [5]:
crime = crime.set_index('Year',drop=True)
crime
Out[5]:
PopulationTotalViolentPropertyMurderForcible_RapeRobberyAggravated_assaultBurglaryLarceny_TheftVehicle_Theft
Year
1960-01-01179323175338420028846030957009110171901078401543209121001855400328200
1961-01-01182992000348800028939031986008740172201066701567609496001913000336000
1962-01-01185771000375220030151034507008530175501108601645709943002089600366800
1963-01-011884830004109500316970379250086401765011647017421010864002297800408300
1964-01-011911410004564600364220420040093602142013039020305012132002514400472800
1965-01-011935260004739400387390435200099602341013869021533012825002572600496900
1966-01-0119557600052235004301804793300110402582015799023533014101002822000561200
1967-01-0119745700059034004999305403500122402762020291025716016321003111600659800
1968-01-0119939900067202005950106125200138003167026284028670018589003482700783600
1969-01-0120138500074109006618706749000147603717029885031109019819003888600878500
1970-01-0120323529880980007388207359200160003799034986033497022050004225800928400
1971-01-0120621200085882008165007771700177804226038770036876023993004424200948200
1972-01-0120823000082488008349007413900186704685037629039309023755004151200887200
1973-01-0120985100087181008759107842200196405140038422042065025655004347900928800
1974-01-01211392000102534009747209278700207105540044240045621030392005262500977100
1975-01-01213124000112924001039710102527002051056090470500492620326530059777001009600
1976-01-0121465900011349700100421010345500187805708042781050053031087006270800966000
1977-01-012163320001098450010295809955000191206350041261053435030715005905700977700
1978-01-01218059000112090001085550101234001956067610426930571460312830059910001004100
1979-01-01220099000122495001208030110415002146076390480700629480332770066010001112800
1980-01-01225349264134083001344520120637002304082990565840672650379520071369001131700
1981-01-01229146000134238001361820120619002252082500592910663900377970071944001087800
1982-01-01231534000129744001322390116520002101078770553130669480344710071425001062400
1983-01-01233981000121086001258090108505001931078920506570653290312990067128001007900
1984-01-01236158000118818001273280106085001869084230485010685350298440065919001032200
1985-01-01238740000124314001328800111026001898088670497870723250307330069264001102900
1986-01-01240132887132118691489169117227002061391459542775834322324141072571531224137
1987-01-01242282918135087001483999120247002009691110517704855088323618474999001288674
1988-01-01245807000139231001566220123569002068092490542970910090321810077059001432900
1989-01-01248239000142514001646040126054002150094500578330951710316820078724001564800
1990-01-0124870987314475600182013012655500234401025606392701054860307390079457001635900
1991-01-0125217700014872900191177012961100247001065906877301092740315720081422001661700
1992-01-0125508200014438200193227012505900237601090606724801126970297990079152001610800
1993-01-0125790800014144800192602012218800245301060106598701135610283480078209001563100
1994-01-0126034100013989500185767012131900233301022206189501113180271280078798001539300
1995-01-012627550001386270017987901206390021610974705805101099210259380079977001472400
1996-01-012652285721349386316885401180530019650962505355901037050250640079047001394200
1997-01-012676370001319457116347701155817518208961534985341023201246052677437601354189
1998-01-01270296000124756341531044109445901691493103446625974402232995073738861240754
1999-01-01272690813116343781426044102083341552289411409371911740210073969555201152075
2000-01-01281421906116080721425486101825861558690178408016911706205099269715901160002
2001-01-01285317559118766691439480104374801603790863423557909023211653170922671228391
2002-01-01287973924118789541423677104552771622995235420806891407215125270573701246646
2003-01-01290690788118265381383676104428621652893883414235859030215483470268021261226
2004-01-01293656842116794741360088103193861614895089401470847381214444669370891237851
2005-01-01296507061115654991390745101747541674094347417438862220215544867834471235859
2006-01-0129939848411401511141804399835681703092757447403860853218374666070131192809
2007-01-0130162115711251828140833798434811692990427445125855856217614065685721095769
2008-01-013043748461116054313926289767915164429047944357484213422284746588046958629
2009-01-013070065501076295613258969337060153998924140874281251422033136338095795652
2010-01-013093302191036387312512489112625147728559336908978184421684576204601739565
2011-01-013115878161025877412060319052743146618417535477275242321851406151095716508
2012-01-013138736851021905912170679001992148668514135505176200921099326168874723186
2013-01-01316497531985044511996848650761143198210934509572657519318356018632700294
2014-01-01318857056947581611979878277829142498404132580274129117298065858496689527

Step 7. Delete the Total column

In [6]:
del crime['Total']
crime.head()
Out[6]:
PopulationViolentPropertyMurderForcible_RapeRobberyAggravated_assaultBurglaryLarceny_TheftVehicle_Theft
Year
1960-01-0117932317528846030957009110171901078401543209121001855400328200
1961-01-0118299200028939031986008740172201066701567609496001913000336000
1962-01-0118577100030151034507008530175501108601645709943002089600366800
1963-01-01188483000316970379250086401765011647017421010864002297800408300
1964-01-01191141000364220420040093602142013039020305012132002514400472800

Step 8. Group the year by decades and sum the values

Pay attention to the Population column number, summing this column is a mistake

In [7]:
# 10년 단위로 묶어줌 
crimes = crime.resample('10AS').sum()

population = crimes['Population'].resample('10AS').max()
print(population)
crimes['Population'] = population
crimes
Year
1960-01-01    1915053175
1970-01-01    2121193298
1980-01-01    2371370069
1990-01-01    2612825258
2000-01-01    2947969117
2010-01-01    1570146307
Freq: 10AS-JAN, Name: Population, dtype: int64
Out[7]:
PopulationViolentPropertyMurderForcible_RapeRobberyAggravated_assaultBurglaryLarceny_TheftVehicle_Theft
Year
1960-01-0119150531754134930451609001061802367201633510215852013321100265477005292100
1970-01-0121211932989607930913838001922305545704159020470212028486000531578009739900
1980-01-0123713700691407432811704890020643986563953831097619130330734947204025311935411
1990-01-01261282525817527048119053499211664998827574893010568963267500157767936614624418
2000-01-0129479691171396805610094436916306892249942303668652124215651766797029111412834
2010-01-011570146307607201744095950728674210591749809376414210125170304016983569080

Step 9. What is the most dangerous decade to live in the US?

In [8]:
# idxmax를 통해 column별 가장 높았던 index값을 return 해준다. 
# 90년대가 각종 범죄가 많아 위험했음을 알 수 있다. 
crimes.idxmax()
Out[8]:
Population           2000-01-01
Violent              1990-01-01
Property             1990-01-01
Murder               1990-01-01
Forcible_Rape        1990-01-01
Robbery              1990-01-01
Aggravated_assault   1990-01-01
Burglary             1980-01-01
Larceny_Theft        1990-01-01
Vehicle_Theft        1990-01-01
dtype: datetime64[ns]


728x90
반응형