728x90

Fictional Army - Filtering and Sorting

Introduction:

This exercise was inspired by this page

Special thanks to: https://github.com/chrisalbon for sharing the dataset and materials.

Step 1. Import the necessary libraries¶

In [1]:

import pandas as pd

Step 2. This is the data given as a dictionary

In [2]:

# Create an example dataframe about a fictional army
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
            'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
            'deaths': [523, 52, 25, 616, 43, 234, 523, 62, 62, 73, 37, 35],
            'battles': [5, 42, 2, 2, 4, 7, 8, 3, 4, 7, 8, 9],
            'size': [1045, 957, 1099, 1400, 1592, 1006, 987, 849, 973, 1005, 1099, 1523],
            'veterans': [1, 5, 62, 26, 73, 37, 949, 48, 48, 435, 63, 345],
            'readiness': [1, 2, 3, 3, 2, 1, 2, 3, 2, 1, 2, 3],
            'armored': [1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1],
            'deserters': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
            'origin': ['Arizona', 'California', 'Texas', 'Florida', 'Maine', 'Iowa', 'Alaska', 'Washington', 'Oregon', 'Wyoming', 'Louisana', 'Georgia']}

Step 3. Create a dataframe and assign it to a variable called army.

Don't forget to include the columns names in the order presented in the dictionary ('regiment', 'company', 'deaths'...) so that the column index order is consistent with the solutions. If omitted, pandas will order the columns alphabetically.

In [6]:

army = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'deaths', 'battles', 'size', 'veterans', 'readiness', 'armored', 'deserters', 'origin'])
army

Out[6]:

	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters	origin
0	Nighthawks	1st	523	5	1045	1	1	1	4	Arizona
1	Nighthawks	1st	52	42	957	5	2	0	24	California
2	Nighthawks	2nd	25	2	1099	62	3	1	31	Texas
3	Nighthawks	2nd	616	2	1400	26	3	1	2	Florida
4	Dragoons	1st	43	4	1592	73	2	0	3	Maine
5	Dragoons	1st	234	7	1006	37	1	1	4	Iowa
6	Dragoons	2nd	523	8	987	949	2	0	24	Alaska
7	Dragoons	2nd	62	3	849	48	3	1	31	Washington
8	Scouts	1st	62	4	973	48	2	0	2	Oregon
9	Scouts	1st	73	7	1005	435	1	0	3	Wyoming
10	Scouts	2nd	37	8	1099	63	2	1	2	Louisana
11	Scouts	2nd	35	9	1523	345	3	1	3	Georgia

Step 4. Set the 'origin' colum as the index of the dataframe

In [7]:

army = army.set_index('origin') # index를 기존 column인 origin으로 설정
army

Out[7]:

	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters
origin
Arizona	Nighthawks	1st	523	5	1045	1	1	1	4
California	Nighthawks	1st	52	42	957	5	2	0	24
Texas	Nighthawks	2nd	25	2	1099	62	3	1	31
Florida	Nighthawks	2nd	616	2	1400	26	3	1	2
Maine	Dragoons	1st	43	4	1592	73	2	0	3
Iowa	Dragoons	1st	234	7	1006	37	1	1	4
Alaska	Dragoons	2nd	523	8	987	949	2	0	24
Washington	Dragoons	2nd	62	3	849	48	3	1	31
Oregon	Scouts	1st	62	4	973	48	2	0	2
Wyoming	Scouts	1st	73	7	1005	435	1	0	3
Louisana	Scouts	2nd	37	8	1099	63	2	1	2
Georgia	Scouts	2nd	35	9	1523	345	3	1	3

Step 5. Print only the column veterans

In [8]:

army['veterans']

Out[8]:

origin
Arizona         1
California      5
Texas          62
Florida        26
Maine          73
Iowa           37
Alaska        949
Washington     48
Oregon         48
Wyoming       435
Louisana       63
Georgia       345
Name: veterans, dtype: int64

Step 6. Print the columns 'veterans' and 'deaths'

In [13]:

army[['veterans','deaths']]

Out[13]:

	veterans	deaths
origin
Arizona	1	523
California	5	52
Texas	62	25
Florida	26	616
Maine	73	43
Iowa	37	234
Alaska	949	523
Washington	48	62
Oregon	48	62
Wyoming	435	73
Louisana	63	37
Georgia	345	35

Step 7. Print the name of all the columns.

In [14]:

army.columns

Out[14]:

Index(['regiment', 'company', 'deaths', 'battles', 'size', 'veterans',
       'readiness', 'armored', 'deserters'],
      dtype='object')

Step 8. Select the 'deaths', 'size' and 'deserters' columns from Maine and Alaska

In [16]:

army.loc[['Marine','Alaska'],['deaths','size']]   
# loc[[행],[열]]

/Users/charming/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  """Entry point for launching an IPython kernel.

Out[16]:

	deaths	size
origin
Marine	NaN	NaN
Alaska	523.0	987.0

Step 9. Select the rows 3 to 7 and the columns 3 to 6

In [18]:

army.iloc[3:7,3:6]

# iloc은 숫자 인덱스로 / loc은 실제 변수 이름으로

Out[18]:

	battles	size	veterans
origin
Florida	2	1400	26
Maine	4	1592	73
Iowa	7	1006	37
Alaska	8	987	949

Step 10. Select every row after the fourth row

In [19]:

army.iloc[4:]
# 4 : Florida 포함 X

Out[19]:

	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters
origin
Maine	Dragoons	1st	43	4	1592	73	2	0	3
Iowa	Dragoons	1st	234	7	1006	37	1	1	4
Alaska	Dragoons	2nd	523	8	987	949	2	0	24
Washington	Dragoons	2nd	62	3	849	48	3	1	31
Oregon	Scouts	1st	62	4	973	48	2	0	2
Wyoming	Scouts	1st	73	7	1005	435	1	0	3
Louisana	Scouts	2nd	37	8	1099	63	2	1	2
Georgia	Scouts	2nd	35	9	1523	345	3	1	3

Step 11. Select every row up to the 4th row

In [20]:

army[:4]

Out[20]:

	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters
origin
Arizona	Nighthawks	1st	523	5	1045	1	1	1	4
California	Nighthawks	1st	52	42	957	5	2	0	24
Texas	Nighthawks	2nd	25	2	1099	62	3	1	31
Florida	Nighthawks	2nd	616	2	1400	26	3	1	2

Step 12. Select the 3rd column up to the 7th column

In [26]:

army.iloc[:,3:7]
# 3번째 열부터 7번째 열까지 인덱싱

Out[26]:

	battles	size	veterans	readiness
origin
Arizona	5	1045	1	1
California	42	957	5	2
Texas	2	1099	62	3
Florida	2	1400	26	3
Maine	4	1592	73	2
Iowa	7	1006	37	1
Alaska	8	987	949	2
Washington	3	849	48	3
Oregon	4	973	48	2
Wyoming	7	1005	435	1
Louisana	8	1099	63	2
Georgia	9	1523	345	3

Step 13. Select rows where df.deaths is greater than 50

In [27]:

army[army['deaths'] > 50]

Out[27]:

	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters
origin
Arizona	Nighthawks	1st	523	5	1045	1	1	1	4
California	Nighthawks	1st	52	42	957	5	2	0	24
Florida	Nighthawks	2nd	616	2	1400	26	3	1	2
Iowa	Dragoons	1st	234	7	1006	37	1	1	4
Alaska	Dragoons	2nd	523	8	987	949	2	0	24
Washington	Dragoons	2nd	62	3	849	48	3	1	31
Oregon	Scouts	1st	62	4	973	48	2	0	2
Wyoming	Scouts	1st	73	7	1005	435	1	0	3

Step 14. Select rows where df.deaths is greater than 500 or less than 50

In [30]:

army[(army['deaths'] > 500) | (army['deaths'] < 50)]

Out[30]:

	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters
origin
Arizona	Nighthawks	1st	523	5	1045	1	1	1	4
Texas	Nighthawks	2nd	25	2	1099	62	3	1	31
Florida	Nighthawks	2nd	616	2	1400	26	3	1	2
Maine	Dragoons	1st	43	4	1592	73	2	0	3
Alaska	Dragoons	2nd	523	8	987	949	2	0	24
Louisana	Scouts	2nd	37	8	1099	63	2	1	2
Georgia	Scouts	2nd	35	9	1523	345	3	1	3

Step 15. Select all the regiments not named "Dragoons"

In [36]:

army[army['regiment'] != 'Dragoons']
# army[(army['regiment'] != 'Dragoons')]  가능

Out[36]:

	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters
origin
Arizona	Nighthawks	1st	523	5	1045	1	1	1	4
California	Nighthawks	1st	52	42	957	5	2	0	24
Texas	Nighthawks	2nd	25	2	1099	62	3	1	31
Florida	Nighthawks	2nd	616	2	1400	26	3	1	2
Oregon	Scouts	1st	62	4	973	48	2	0	2
Wyoming	Scouts	1st	73	7	1005	435	1	0	3
Louisana	Scouts	2nd	37	8	1099	63	2	1	2
Georgia	Scouts	2nd	35	9	1523	345	3	1	3

Step 16. Select the rows called Texas and Arizona

In [37]:

army.loc[['Texas','Arizona']]

Out[37]:

	regiment	company	deaths	battles	size	veterans	readiness	armored	deserters
origin
Texas	Nighthawks	2nd	25	2	1099	62	3	1	31
Arizona	Nighthawks	1st	523	5	1045	1	1	1	4

Step 17. Select the third cell in the row named Arizona

In [41]:

# army.columns[2] = 'deaths'   3번째 열
army.loc[['Arizona'],['deaths']]

Out[41]:

	deaths
origin
Arizona	523

Step 18. Select the third cell down in the column named deaths

In [49]:

# army.index[2]      -> 3번째 행 'Texas'
army.loc[['Texas'],['deaths']]

Out[49]:

	deaths
origin
Texas	25

728x90

[Pandas] Pandas02 - Fictional Army 풀이

Fictional Army - Filtering and Sorting

Introduction:

Step 1. Import the necessary libraries¶

Step 2. This is the data given as a dictionary

Step 3. Create a dataframe and assign it to a variable called army.

Don't forget to include the columns names in the order presented in the dictionary ('regiment', 'company', 'deaths'...) so that the column index order is consistent with the solutions. If omitted, pandas will order the columns alphabetically.

Step 4. Set the 'origin' colum as the index of the dataframe

Step 5. Print only the column veterans

Step 6. Print the columns 'veterans' and 'deaths'

Step 7. Print the name of all the columns.

Step 8. Select the 'deaths', 'size' and 'deserters' columns from Maine and Alaska

Step 9. Select the rows 3 to 7 and the columns 3 to 6

Step 10. Select every row after the fourth row

Step 11. Select every row up to the 4th row

Step 12. Select the 3rd column up to the 7th column

Step 13. Select rows where df.deaths is greater than 50

Step 14. Select rows where df.deaths is greater than 500 or less than 50

Step 15. Select all the regiments not named "Dragoons"

Step 16. Select the rows called Texas and Arizona

Step 17. Select the third cell in the row named Arizona

Step 18. Select the third cell down in the column named deaths

티스토리툴바