[Data Handling] Pandas :: (4) Data selection

연두·2021년 2월 16일
0

Python for ML

목록 보기
16/34
post-thumbnail

pandas :: selecton & drop

import pandas as pd

Data loading

  • xlrd 모듈이 없을 경우, conda install xlrd
!conda install --y xlrd

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

import numpy as np
df = pd.read_excel('excel-comp-data.xlsx')
df.head()
account name street city state postal-code Jan Feb Mar
0 211829 Kerluke, Koepp and Hilpert 34456 Sean Highway New Jaycob Texas 28752 10000 62000 35000
1 320563 Walter-Trantow 1311 Alvis Tunnel Port Khadijah NorthCarolina 38365 95000 45000 35000
2 648336 Bashirian, Kunde and Price 62184 Schamberger Underpass Apt. 231 New Lilianland Iowa 76517 91000 120000 35000
3 109996 D'Amore, Gleichner and Bode 155 Fadel Crescent Apt. 144 Hyattburgh Maine 46021 45000 120000 10000
4 121213 Bauch-Goldner 7274 Marissa Common Shanahanchester California 49681 162000 120000 35000

data selection with column names and index number

# 한 개의 column 선택시
df["account"].head(2)

0 211829
1 320563
Name: account, dtype: int64

# 한개 이상의 column 선택
df[["account", "street", "state"]].head(3)
account street state
0 211829 34456 Sean Highway Texas
1 320563 1311 Alvis Tunnel NorthCarolina
2 648336 62184 Schamberger Underpass Apt. 231 Iowa
# column 이름 없이 사용하는 index number는 row 기준 표시
df[:10]
account name street city state postal-code Jan Feb Mar
0 211829 Kerluke, Koepp and Hilpert 34456 Sean Highway New Jaycob Texas 28752 10000 62000 35000
1 320563 Walter-Trantow 1311 Alvis Tunnel Port Khadijah NorthCarolina 38365 95000 45000 35000
2 648336 Bashirian, Kunde and Price 62184 Schamberger Underpass Apt. 231 New Lilianland Iowa 76517 91000 120000 35000
3 109996 D'Amore, Gleichner and Bode 155 Fadel Crescent Apt. 144 Hyattburgh Maine 46021 45000 120000 10000
4 121213 Bauch-Goldner 7274 Marissa Common Shanahanchester California 49681 162000 120000 35000
5 132971 Williamson, Schumm and Hettinger 89403 Casimer Spring Jeremieburgh Arkansas 62785 150000 120000 35000
6 145068 Casper LLC 340 Consuela Bridge Apt. 400 Lake Gabriellaton Mississipi 18008 62000 120000 70000
7 205217 Kovacek-Johnston 91971 Cronin Vista Suite 601 Deronville RhodeIsland 53461 145000 95000 35000
8 209744 Champlin-Morar 26739 Grant Lock Lake Juliannton Pennsylvania 64415 70000 95000 35000
9 212303 Gerhold-Maggio 366 Maggio Grove Apt. 998 North Ras Idaho 46308 70000 120000 35000
# column 이름과 함께 row index 사용시, 해당 column만
df["name"][:3]

0 Kerluke, Koepp and Hilpert
1 Walter-Trantow
2 Bashirian, Kunde and Price
Name: name, dtype: object

Series selection

account_series = df["account"]
account_series[:3]

0 211829
1 320563
2 648336
Name: account, dtype: int64

account_series[[1, 5, 2]]  # 1개 이상의 index

1 320563
5 132971
2 648336
Name: account, dtype: int64

account_series[account_series<250000]  # boolean index

0 211829
3 109996
4 121213
5 132971
6 145068
7 205217
8 209744
9 212303
10 214098
11 231907
12 242368
Name: account, dtype: int64

Index 변경

df.index = df["account"]
df.head()
account name street city state postal-code Jan Feb Mar
account
211829 211829 Kerluke, Koepp and Hilpert 34456 Sean Highway New Jaycob Texas 28752 10000 62000 35000
320563 320563 Walter-Trantow 1311 Alvis Tunnel Port Khadijah NorthCarolina 38365 95000 45000 35000
648336 648336 Bashirian, Kunde and Price 62184 Schamberger Underpass Apt. 231 New Lilianland Iowa 76517 91000 120000 35000
109996 109996 D'Amore, Gleichner and Bode 155 Fadel Crescent Apt. 144 Hyattburgh Maine 46021 45000 120000 10000
121213 121213 Bauch-Goldner 7274 Marissa Common Shanahanchester California 49681 162000 120000 35000
del df["account"]
df.head()
name street city state postal-code Jan Feb Mar
account
211829 Kerluke, Koepp and Hilpert 34456 Sean Highway New Jaycob Texas 28752 10000 62000 35000
320563 Walter-Trantow 1311 Alvis Tunnel Port Khadijah NorthCarolina 38365 95000 45000 35000
648336 Bashirian, Kunde and Price 62184 Schamberger Underpass Apt. 231 New Lilianland Iowa 76517 91000 120000 35000
109996 D'Amore, Gleichner and Bode 155 Fadel Crescent Apt. 144 Hyattburgh Maine 46021 45000 120000 10000
121213 Bauch-Goldner 7274 Marissa Common Shanahanchester California 49681 162000 120000 35000

Basic, loc, iloc selection

# column과 index number (주로 col 많을 때 사용)
df[["name", "street"]][:2]
name street
account
211829 Kerluke, Koepp and Hilpert 34456 Sean Highway
320563 Walter-Trantow 1311 Alvis Tunnel
# column number와 index number (col이 몇 개 없을 때 사용)
df.iloc[:2, :2]
name street
account
211829 Kerluke, Koepp and Hilpert 34456 Sean Highway
320563 Walter-Trantow 1311 Alvis Tunnel
# column과 index name
df.loc[[211829, 320563], ["name", "street"]]
name street
account
211829 Kerluke, Koepp and Hilpert 34456 Sean Highway
320563 Walter-Trantow 1311 Alvis Tunnel

index 재설정

df.index = list(range(0, 15))
df.head()
name street city state postal-code Jan Feb Mar
0 Kerluke, Koepp and Hilpert 34456 Sean Highway New Jaycob Texas 28752 10000 62000 35000
1 Walter-Trantow 1311 Alvis Tunnel Port Khadijah NorthCarolina 38365 95000 45000 35000
2 Bashirian, Kunde and Price 62184 Schamberger Underpass Apt. 231 New Lilianland Iowa 76517 91000 120000 35000
3 D'Amore, Gleichner and Bode 155 Fadel Crescent Apt. 144 Hyattburgh Maine 46021 45000 120000 10000
4 Bauch-Goldner 7274 Marissa Common Shanahanchester California 49681 162000 120000 35000

data drop

df
name street city state postal-code Jan Feb Mar
0 Kerluke, Koepp and Hilpert 34456 Sean Highway New Jaycob Texas 28752 10000 62000 35000
1 Walter-Trantow 1311 Alvis Tunnel Port Khadijah NorthCarolina 38365 95000 45000 35000
2 Bashirian, Kunde and Price 62184 Schamberger Underpass Apt. 231 New Lilianland Iowa 76517 91000 120000 35000
3 D'Amore, Gleichner and Bode 155 Fadel Crescent Apt. 144 Hyattburgh Maine 46021 45000 120000 10000
4 Bauch-Goldner 7274 Marissa Common Shanahanchester California 49681 162000 120000 35000
5 Williamson, Schumm and Hettinger 89403 Casimer Spring Jeremieburgh Arkansas 62785 150000 120000 35000
6 Casper LLC 340 Consuela Bridge Apt. 400 Lake Gabriellaton Mississipi 18008 62000 120000 70000
7 Kovacek-Johnston 91971 Cronin Vista Suite 601 Deronville RhodeIsland 53461 145000 95000 35000
8 Champlin-Morar 26739 Grant Lock Lake Juliannton Pennsylvania 64415 70000 95000 35000
9 Gerhold-Maggio 366 Maggio Grove Apt. 998 North Ras Idaho 46308 70000 120000 35000
10 Goodwin, Homenick and Jerde 649 Cierra Forks Apt. 078 Rosaberg Tenessee 47743 45000 120000 55000
11 Hahn-Moore 18115 Olivine Throughway Norbertomouth NorthDakota 31415 150000 10000 162000
12 Frami, Anderson and Donnelly 182 Bertie Road East Davian Iowa 72686 162000 120000 35000
13 Walsh-Haley 2624 Beatty Parkways Goodwinmouth RhodeIsland 31919 55000 120000 35000
14 McDermott PLC 8917 Bergstrom Meadow Kathryneborough Delaware 27933 150000 120000 70000
df.drop(1)  # index number로 drop (column 단위는 drop 아닌 del로 삭제)
name street city state postal-code Jan Feb Mar
0 Kerluke, Koepp and Hilpert 34456 Sean Highway New Jaycob Texas 28752 10000 62000 35000
2 Bashirian, Kunde and Price 62184 Schamberger Underpass Apt. 231 New Lilianland Iowa 76517 91000 120000 35000
3 D'Amore, Gleichner and Bode 155 Fadel Crescent Apt. 144 Hyattburgh Maine 46021 45000 120000 10000
4 Bauch-Goldner 7274 Marissa Common Shanahanchester California 49681 162000 120000 35000
5 Williamson, Schumm and Hettinger 89403 Casimer Spring Jeremieburgh Arkansas 62785 150000 120000 35000
6 Casper LLC 340 Consuela Bridge Apt. 400 Lake Gabriellaton Mississipi 18008 62000 120000 70000
7 Kovacek-Johnston 91971 Cronin Vista Suite 601 Deronville RhodeIsland 53461 145000 95000 35000
8 Champlin-Morar 26739 Grant Lock Lake Juliannton Pennsylvania 64415 70000 95000 35000
9 Gerhold-Maggio 366 Maggio Grove Apt. 998 North Ras Idaho 46308 70000 120000 35000
10 Goodwin, Homenick and Jerde 649 Cierra Forks Apt. 078 Rosaberg Tenessee 47743 45000 120000 55000
11 Hahn-Moore 18115 Olivine Throughway Norbertomouth NorthDakota 31415 150000 10000 162000
12 Frami, Anderson and Donnelly 182 Bertie Road East Davian Iowa 72686 162000 120000 35000
13 Walsh-Haley 2624 Beatty Parkways Goodwinmouth RhodeIsland 31919 55000 120000 35000
14 McDermott PLC 8917 Bergstrom Meadow Kathryneborough Delaware 27933 150000 120000 70000
df.drop([0, 1, 2, 3])  # 한 개 이상의 index number로 drop
name street city state postal-code Jan Feb Mar
4 Bauch-Goldner 7274 Marissa Common Shanahanchester California 49681 162000 120000 35000
5 Williamson, Schumm and Hettinger 89403 Casimer Spring Jeremieburgh Arkansas 62785 150000 120000 35000
6 Casper LLC 340 Consuela Bridge Apt. 400 Lake Gabriellaton Mississipi 18008 62000 120000 70000
7 Kovacek-Johnston 91971 Cronin Vista Suite 601 Deronville RhodeIsland 53461 145000 95000 35000
8 Champlin-Morar 26739 Grant Lock Lake Juliannton Pennsylvania 64415 70000 95000 35000
9 Gerhold-Maggio 366 Maggio Grove Apt. 998 North Ras Idaho 46308 70000 120000 35000
10 Goodwin, Homenick and Jerde 649 Cierra Forks Apt. 078 Rosaberg Tenessee 47743 45000 120000 55000
11 Hahn-Moore 18115 Olivine Throughway Norbertomouth NorthDakota 31415 150000 10000 162000
12 Frami, Anderson and Donnelly 182 Bertie Road East Davian Iowa 72686 162000 120000 35000
13 Walsh-Haley 2624 Beatty Parkways Goodwinmouth RhodeIsland 31919 55000 120000 35000
14 McDermott PLC 8917 Bergstrom Meadow Kathryneborough Delaware 27933 150000 120000 70000
df.drop("city", axis = 1)  # column중에 "city"
name street state postal-code Jan Feb Mar
0 Kerluke, Koepp and Hilpert 34456 Sean Highway Texas 28752 10000 62000 35000
1 Walter-Trantow 1311 Alvis Tunnel NorthCarolina 38365 95000 45000 35000
2 Bashirian, Kunde and Price 62184 Schamberger Underpass Apt. 231 Iowa 76517 91000 120000 35000
3 D'Amore, Gleichner and Bode 155 Fadel Crescent Apt. 144 Maine 46021 45000 120000 10000
4 Bauch-Goldner 7274 Marissa Common California 49681 162000 120000 35000
5 Williamson, Schumm and Hettinger 89403 Casimer Spring Arkansas 62785 150000 120000 35000
6 Casper LLC 340 Consuela Bridge Apt. 400 Mississipi 18008 62000 120000 70000
7 Kovacek-Johnston 91971 Cronin Vista Suite 601 RhodeIsland 53461 145000 95000 35000
8 Champlin-Morar 26739 Grant Lock Pennsylvania 64415 70000 95000 35000
9 Gerhold-Maggio 366 Maggio Grove Apt. 998 Idaho 46308 70000 120000 35000
10 Goodwin, Homenick and Jerde 649 Cierra Forks Apt. 078 Tenessee 47743 45000 120000 55000
11 Hahn-Moore 18115 Olivine Throughway NorthDakota 31415 150000 10000 162000
12 Frami, Anderson and Donnelly 182 Bertie Road Iowa 72686 162000 120000 35000
13 Walsh-Haley 2624 Beatty Parkways RhodeIsland 31919 55000 120000 35000
14 McDermott PLC 8917 Bergstrom Meadow Delaware 27933 150000 120000 70000
df.drop(["street", "state"], axis = 1)  # column중에 "street", "state"
name city postal-code Jan Feb Mar
0 Kerluke, Koepp and Hilpert New Jaycob 28752 10000 62000 35000
1 Walter-Trantow Port Khadijah 38365 95000 45000 35000
2 Bashirian, Kunde and Price New Lilianland 76517 91000 120000 35000
3 D'Amore, Gleichner and Bode Hyattburgh 46021 45000 120000 10000
4 Bauch-Goldner Shanahanchester 49681 162000 120000 35000
5 Williamson, Schumm and Hettinger Jeremieburgh 62785 150000 120000 35000
6 Casper LLC Lake Gabriellaton 18008 62000 120000 70000
7 Kovacek-Johnston Deronville 53461 145000 95000 35000
8 Champlin-Morar Lake Juliannton 64415 70000 95000 35000
9 Gerhold-Maggio North Ras 46308 70000 120000 35000
10 Goodwin, Homenick and Jerde Rosaberg 47743 45000 120000 55000
11 Hahn-Moore Norbertomouth 31415 150000 10000 162000
12 Frami, Anderson and Donnelly East Davian 72686 162000 120000 35000
13 Walsh-Haley Goodwinmouth 31919 55000 120000 35000
14 McDermott PLC Kathryneborough 27933 150000 120000 70000


https://www.boostcourse.org/ai222/lecture/23822

0개의 댓글