๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ๐ผ
Tidy data๋ ๋ฐ์ดํฐ๊ฐ ๋ชฉ์ ์ ๋ง๋ ํ์์ ๊ฐ๊ณ ์์์ ์๋ฏธํ๋ค. Rํ๋ก๊ทธ๋๋ฐ ์ฅ์ธ์ด์ ํต๊ณํ์์ธ Hadley Wickham์ ๋ฐ๋ฅด๋ฉด Tidy data๋ ๋ค์๊ณผ ๊ฐ์ ์กฐ๊ฑด์ ๋ง์กฑํ๋ 2-D ํ ์ด๋ธ์ด๋ค:
1. each column represents a variable;
2. each row represents an observation;
3. each entry of the table represents a single value, which may come from either categorical(discrete) or continuous spaces.
'tidy'ํ ํ
์ด๋ธ์ ์ฐ๋ฆฌ๋ 'tibble'์ด๋ผ๊ณ ๋ถ๋ฅด๊ธฐ๋ ํ๋ค
import pandas as pd
from io import StringIO
from IPython.display import display #๊ทธ๋ํ๋ df์์ฑ์ ํ์ฉํ๋ฉด ํธํ๋ค
A_csv = """country,year,cases
Afghanistan,1999,745
Brazil,1999,37737
China,1999,212258
Afghanistan,2000,2666
Brazil,2000,80488
China,2000,213766"""
with StringIO(A_csv) as fp:
A = pd.read_csv(fp)
print("=== A ===")
display(A)
A_csv = """country,year,cases
Afghanistan,1999,745
Brazil,1999,37737
China,1999,212258
Afghanistan,2000,2666
Brazil,2000,80488
China,2000,213766"""
with StringIO(A_csv) as fp:
A = pd.read_csv(fp)
print("=== A ===")
display(A)
merge()ํจ์๋ฅผ ์ด์ฉํ์ฌ ์ด ๋ df๋ฅผ ์ฝ๊ฒ ํฉ์น ์ ์๋ค.
C = A.merge(B, on=['country', 'year'])
print("\n=== C = merge(A, B) ===")
display(C)
์ฝ๊ฒ ๋งํ์๋ฉด... ๋ค์๊ณผ ๊ฐ๋ค:
with StringIO("""x,y,z
bug,1,d
rug,2,d
lug,3,d
mug,4,d""") as fp:
D = pd.read_csv(fp)
print("=== D ===")
display(D)
with StringIO("""x,y,w
hug,-1,e
smug,-2,e
rug,-3,e
tug,-4,e
bug,1,e""") as fp:
E = pd.read_csv(fp)
print("\n=== E ===")
display(E)
print("\n=== Outer-join (D, E) ===")
display(D.merge(E, on=['x', 'y'], how='outer'))
print("\n=== Left-join (D, E) ===")
display(D.merge(E, on=['x', 'y'], how='left'))
print("\n=== Right-join (D, E) ===")
display(D.merge(E, on=['x', 'y'], how='right'))
print("\n=== Inner-join (D, E) ===")
display(D.merge(E, on=['x', 'y']))
์ฐธ ์ฝ์ฃ ~?