🐍Python/Pandas

[Pandas] Pandas02 - Fictional Army 풀이

728x90
반응형

Fictional Army - Filtering and Sorting

Introduction:

This exercise was inspired by this page

Special thanks to: https://github.com/chrisalbon for sharing the dataset and materials.

Step 1. Import the necessary libraries

In [1]:
import pandas as pd

Step 2. This is the data given as a dictionary

In [2]:
# Create an example dataframe about a fictional army
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
            'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
            'deaths': [523, 52, 25, 616, 43, 234, 523, 62, 62, 73, 37, 35],
            'battles': [5, 42, 2, 2, 4, 7, 8, 3, 4, 7, 8, 9],
            'size': [1045, 957, 1099, 1400, 1592, 1006, 987, 849, 973, 1005, 1099, 1523],
            'veterans': [1, 5, 62, 26, 73, 37, 949, 48, 48, 435, 63, 345],
            'readiness': [1, 2, 3, 3, 2, 1, 2, 3, 2, 1, 2, 3],
            'armored': [1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1],
            'deserters': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
            'origin': ['Arizona', 'California', 'Texas', 'Florida', 'Maine', 'Iowa', 'Alaska', 'Washington', 'Oregon', 'Wyoming', 'Louisana', 'Georgia']}

Step 3. Create a dataframe and assign it to a variable called army.

Don't forget to include the columns names in the order presented in the dictionary ('regiment', 'company', 'deaths'...) so that the column index order is consistent with the solutions. If omitted, pandas will order the columns alphabetically.

In [6]:
army = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'deaths', 'battles', 'size', 'veterans', 'readiness', 'armored', 'deserters', 'origin'])
army 
Out[6]:
regimentcompanydeathsbattlessizeveteransreadinessarmoreddesertersorigin
0Nighthawks1st523510451114Arizona
1Nighthawks1st524295752024California
2Nighthawks2nd2521099623131Texas
3Nighthawks2nd6162140026312Florida
4Dragoons1st434159273203Maine
5Dragoons1st2347100637114Iowa
6Dragoons2nd52389879492024Alaska
7Dragoons2nd623849483131Washington
8Scouts1st62497348202Oregon
9Scouts1st7371005435103Wyoming
10Scouts2nd378109963212Louisana
11Scouts2nd3591523345313Georgia

Step 4. Set the 'origin' colum as the index of the dataframe

In [7]:
army = army.set_index('origin') # index를 기존 column인 origin으로 설정
army
Out[7]:
regimentcompanydeathsbattlessizeveteransreadinessarmoreddeserters
origin
ArizonaNighthawks1st523510451114
CaliforniaNighthawks1st524295752024
TexasNighthawks2nd2521099623131
FloridaNighthawks2nd6162140026312
MaineDragoons1st434159273203
IowaDragoons1st2347100637114
AlaskaDragoons2nd52389879492024
WashingtonDragoons2nd623849483131
OregonScouts1st62497348202
WyomingScouts1st7371005435103
LouisanaScouts2nd378109963212
GeorgiaScouts2nd3591523345313

Step 5. Print only the column veterans

In [8]:
army['veterans']
Out[8]:
origin
Arizona         1
California      5
Texas          62
Florida        26
Maine          73
Iowa           37
Alaska        949
Washington     48
Oregon         48
Wyoming       435
Louisana       63
Georgia       345
Name: veterans, dtype: int64

Step 6. Print the columns 'veterans' and 'deaths'

In [13]:
army[['veterans','deaths']]
Out[13]:
veteransdeaths
origin
Arizona1523
California552
Texas6225
Florida26616
Maine7343
Iowa37234
Alaska949523
Washington4862
Oregon4862
Wyoming43573
Louisana6337
Georgia34535

Step 7. Print the name of all the columns.

In [14]:
army.columns
Out[14]:
Index(['regiment', 'company', 'deaths', 'battles', 'size', 'veterans',
       'readiness', 'armored', 'deserters'],
      dtype='object')

Step 8. Select the 'deaths', 'size' and 'deserters' columns from Maine and Alaska

In [16]:
army.loc[['Marine','Alaska'],['deaths','size']]   
# loc[[행],[열]]
/Users/charming/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  """Entry point for launching an IPython kernel.
Out[16]:
deathssize
origin
MarineNaNNaN
Alaska523.0987.0

Step 9. Select the rows 3 to 7 and the columns 3 to 6

In [18]:
army.iloc[3:7,3:6]

# iloc은 숫자 인덱스로 / loc은 실제 변수 이름으로 
Out[18]:
battlessizeveterans
origin
Florida2140026
Maine4159273
Iowa7100637
Alaska8987949

Step 10. Select every row after the fourth row

In [19]:
army.iloc[4:]
# 4 : Florida 포함 X
Out[19]:
regimentcompanydeathsbattlessizeveteransreadinessarmoreddeserters
origin
MaineDragoons1st434159273203
IowaDragoons1st2347100637114
AlaskaDragoons2nd52389879492024
WashingtonDragoons2nd623849483131
OregonScouts1st62497348202
WyomingScouts1st7371005435103
LouisanaScouts2nd378109963212
GeorgiaScouts2nd3591523345313

Step 11. Select every row up to the 4th row

In [20]:
army[:4]
Out[20]:
regimentcompanydeathsbattlessizeveteransreadinessarmoreddeserters
origin
ArizonaNighthawks1st523510451114
CaliforniaNighthawks1st524295752024
TexasNighthawks2nd2521099623131
FloridaNighthawks2nd6162140026312

Step 12. Select the 3rd column up to the 7th column

In [26]:
army.iloc[:,3:7]
# 3번째 열부터 7번째 열까지 인덱싱
Out[26]:
battlessizeveteransreadiness
origin
Arizona5104511
California4295752
Texas21099623
Florida21400263
Maine41592732
Iowa71006371
Alaska89879492
Washington3849483
Oregon4973482
Wyoming710054351
Louisana81099632
Georgia915233453

Step 13. Select rows where df.deaths is greater than 50

In [27]:
army[army['deaths'] > 50]
Out[27]:
regimentcompanydeathsbattlessizeveteransreadinessarmoreddeserters
origin
ArizonaNighthawks1st523510451114
CaliforniaNighthawks1st524295752024
FloridaNighthawks2nd6162140026312
IowaDragoons1st2347100637114
AlaskaDragoons2nd52389879492024
WashingtonDragoons2nd623849483131
OregonScouts1st62497348202
WyomingScouts1st7371005435103

Step 14. Select rows where df.deaths is greater than 500 or less than 50

In [30]:
army[(army['deaths'] > 500) | (army['deaths'] < 50)]
Out[30]:
regimentcompanydeathsbattlessizeveteransreadinessarmoreddeserters
origin
ArizonaNighthawks1st523510451114
TexasNighthawks2nd2521099623131
FloridaNighthawks2nd6162140026312
MaineDragoons1st434159273203
AlaskaDragoons2nd52389879492024
LouisanaScouts2nd378109963212
GeorgiaScouts2nd3591523345313

Step 15. Select all the regiments not named "Dragoons"

In [36]:
army[army['regiment'] != 'Dragoons']
# army[(army['regiment'] != 'Dragoons')]  가능
Out[36]:
regimentcompanydeathsbattlessizeveteransreadinessarmoreddeserters
origin
ArizonaNighthawks1st523510451114
CaliforniaNighthawks1st524295752024
TexasNighthawks2nd2521099623131
FloridaNighthawks2nd6162140026312
OregonScouts1st62497348202
WyomingScouts1st7371005435103
LouisanaScouts2nd378109963212
GeorgiaScouts2nd3591523345313

Step 16. Select the rows called Texas and Arizona

In [37]:
army.loc[['Texas','Arizona']]
Out[37]:
regimentcompanydeathsbattlessizeveteransreadinessarmoreddeserters
origin
TexasNighthawks2nd2521099623131
ArizonaNighthawks1st523510451114

Step 17. Select the third cell in the row named Arizona

In [41]:
# army.columns[2] = 'deaths'   3번째 열
army.loc[['Arizona'],['deaths']]
Out[41]:
deaths
origin
Arizona523

Step 18. Select the third cell down in the column named deaths

In [49]:
# army.index[2]      -> 3번째 행 'Texas'
army.loc[['Texas'],['deaths']] 
Out[49]:
deaths
origin
Texas25


728x90
반응형