๐Ÿผ ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ Python: Pandas ๊ธฐ์ดˆ

Geondong Kimยท4์ผ ์ „
post-thumbnail

๐Ÿผ ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ Python: Pandas ๊ธฐ์ดˆ (Part 1)

1. Pandas ๊ฐœ์š”

1.1 Pandas๋ž€?

  • ์œ ๋ž˜: ๋™๋ฌผ 'ํŒ๋‹ค(Panda)'๊ฐ€ ์•„๋‹ˆ๋ผ Panel Data (๊ณ„๋Ÿ‰๊ฒฝ์ œํ•™ ์šฉ์–ด)์—์„œ ์œ ๋ž˜๋จ.
  • ์ •์˜: ๋น ๋ฅด๊ณ  ๊ฐ•๋ ฅํ•˜๋ฉฐ ์œ ์—ฐํ•˜๊ณ  ์‚ฌ์šฉํ•˜๊ธฐ ์‰ฌ์šด ์˜คํ”ˆ ์†Œ์Šค ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ฐ ์กฐ์ž‘ ๋„๊ตฌ.
  • ํ•ต์‹ฌ: Python ์ƒํƒœ๊ณ„์—์„œ ๋ฐ์ดํ„ฐ ๋ถ„์„(Data Analysis)์˜ ์‚ฌ์‹ค์ƒ ํ‘œ์ค€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

1.2 ์ฃผ์š” ํŠน์ง• (Library Highlights)

  • DataFrame: ์—‘์…€ ์‹œํŠธ์™€ ์œ ์‚ฌํ•œ 2์ฐจ์› ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ ์ œ๊ณต.
  • ๊ณ ์† ์ž…์ถœ๋ ฅ: CSV, ํ…์ŠคํŠธ, MS Excel, SQL DB, HDF5 ๋“ฑ ๋‹ค์–‘ํ•œ ํฌ๋งท์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋น ๋ฅด๊ฒŒ ์ฝ๊ณ  ์“ธ ์ˆ˜ ์žˆ์Œ.
  • ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ:
    • ์ง€๋Šฅ์ ์ธ ๋ฐ์ดํ„ฐ ์ •๋ ฌ ๋ฐ ๊ฒฐ์ธก์น˜(Missing Data) ์ฒ˜๋ฆฌ ๊ธฐ๋Šฅ.
    • ๋ฐ์ดํ„ฐ์˜ ์žฌ๊ตฌ์„ฑ(Reshaping)๊ณผ ํ”ผ๋ฒ—(Pivot)์ด ์œ ์—ฐํ•จ.
    • Label ๊ธฐ๋ฐ˜ ์Šฌ๋ผ์ด์‹ฑ, ๊ณ ๊ธ‰ ์ธ๋ฑ์‹ฑ, ๋ถ€๋ถ„ ๋ฐ์ดํ„ฐ ์„ ํƒ ๊ฐ€๋Šฅ.
  • ๋ฐ์ดํ„ฐ ์กฐ์ž‘:
    • ์ปฌ๋Ÿผ์˜ ์‚ฝ์ž…๊ณผ ์‚ญ์ œ๊ฐ€ ์ž์œ ๋กœ์›€.
    • ๊ฐ•๋ ฅํ•œ Group By ๊ธฐ๋Šฅ ์ œ๊ณต (SQL๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ ๋” ์œ ์—ฐํ•จ).
    • ๋ฐ์ดํ„ฐ ์„ธํŠธ ๊ฐ„์˜ ๊ณ ์„ฑ๋Šฅ ๋ณ‘ํ•ฉ(Merge) ๋ฐ ์กฐ์ธ(Join) ์ง€์›.
  • ๊ธฐํƒ€: ๊ณ„์ธต์  ์ถ• ์ธ๋ฑ์‹ฑ(Hierarchical Axis Indexing), ํƒ์›”ํ•œ ์‹œ๊ณ„์—ด(Time Series) ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๊ธฐ๋Šฅ.

1.3 ์–ธ์ œ ์‚ฌ์šฉํ•˜๋Š”๊ฐ€?

  • ๋ณต์žกํ•œ ๋ถ„์„ ๋กœ์ง์ด ํ•„์š”ํ•  ๋•Œ (์‹œ๊ณ„์—ด Resampling, Rolling Window, ๋ถ„ํ•ด ๋“ฑ).
  • ๋ฐ์ดํ„ฐ๋ฅผ ์‹œ๊ฐํ™”ํ•˜๊ธฐ ์œ„ํ•œ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •.
  • ๋จธ์‹ ๋Ÿฌ๋‹(ML)์ด๋‚˜ ํ†ต๊ณ„ ๋ถ„์„์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ์ค€๋น„ (Scikit-learn ๋“ฑ๊ณผ ํ˜ธํ™˜์„ฑ ์šฐ์ˆ˜).
  • ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ ๋ณ€ํ™˜ ๋ฐ ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ.
  • ๊ฐ•๋ ฅํ•œ ๋ฌธ์ž์—ด ์ฒ˜๋ฆฌ (์ •๊ทœ ํ‘œํ˜„์‹ ๋“ฑ).

2. Pandas์˜ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ (Data Structure)

2.1 ๊ตฌ์กฐ ๊ฐœ์š”

Pandas๋Š” ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ํ•ต์‹ฌ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

  1. Series (์‹œ๋ฆฌ์ฆˆ):
    • 1์ฐจ์› ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ๋ฐฐ์—ด (Labeled Array).
    • ๋ชจ๋“  ๋ฐ์ดํ„ฐ ํƒ€์ž…(์ •์ˆ˜, ๋ฌธ์ž์—ด, ์‹ค์ˆ˜ ๋“ฑ)์„ ๋‹ด์„ ์ˆ˜ ์žˆ์Œ.
    • DataFrame์˜ ๋‹จ์ผ ์ปฌ๋Ÿผ(Column) ํ˜น์€ ๋‹จ์ผ ๋กœ์šฐ(Row)์— ํ•ด๋‹นํ•จ.
  2. DataFrame (๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„):
    • 2์ฐจ์› ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ.
    • ์„œ๋กœ ๋‹ค๋ฅธ ํƒ€์ž…์˜ ์ปฌ๋Ÿผ๋“ค์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Œ (์˜ˆ: ์ด๋ฆ„(Str), ๋‚˜์ด(Int), ์ฃผ์†Œ(Str)).
    • *Row(ํ–‰)์™€ Column(์—ด)**๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ๊ฐ๊ฐ์€ Index์™€ Column Name์œผ๋กœ ์ ‘๊ทผ ๊ฐ€๋Šฅ.

2.2 ๊ตฌ์กฐ ์‹œ๊ฐํ™” (Mental Model)

  • DataFrame: ์—ฌ๋Ÿฌ ๊ฐœ์˜ Series๊ฐ€ ๋ถ™์–ด ์žˆ๋Š” ํ˜•ํƒœ.
  • Index: ํ–‰์„ ์‹๋ณ„ํ•˜๋Š” ๋ ˆ์ด๋ธ” (๊ธฐ๋ณธ๊ฐ’์€ 0๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ์ •์ˆ˜).
  • Column Names: ์—ด์„ ์‹๋ณ„ํ•˜๋Š” ๋ ˆ์ด๋ธ”.

3. ์‹ค์Šต: DataFrame๊ณผ Series ๋‹ค๋ฃจ๊ธฐ

3.1 DataFrame ์ƒ์„ฑ ๋ฐ ์†์„ฑ ํ™•์ธ

๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ DataFrame์„ ์ƒ์„ฑํ•˜๊ณ  ์ฃผ์š” ์†์„ฑ์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

import pandas as pd

# DataFrame ์ƒ์„ฑ
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Aritra'],
    'Age': [25, 30, 35],
    'Location': ['Seattle', 'New York', 'Kona']
}, index=['A', 'B', 'C'])  # ์ธ๋ฑ์Šค๋ฅผ ์ง์ ‘ ์ง€์ •

print(df)
# ๊ฒฐ๊ณผ:
#      Name  Age  Location
# A   Alice   25   Seattle
# B     Bob   30  New York
# C  Aritra   35      Kona

# ์ฃผ์š” ์†์„ฑ ํ™•์ธ
print(df.columns)
# Index(['Name', 'Age', 'Location'], dtype='object')

print(df.index)
# Index(['A', 'B', 'C'], dtype='object')

3.2 Series ์ถ”์ถœ ๋ฐ ํƒ€์ž… ํ™•์ธ

DataFrame์—์„œ ํŠน์ • ํ–‰(Row)์ด๋‚˜ ์—ด(Column)์„ ์„ ํƒํ•˜๋ฉด Series๊ฐ€ ๋ฐ˜ํ™˜๋ฉ๋‹ˆ๋‹ค.

# ์ฒซ ๋ฒˆ์งธ ํ–‰(Row) ์„ ํƒ (iloc ์‚ฌ์šฉ)
y = df.iloc[0]

print(type(y))
# <class 'pandas.core.series.Series'> -> Series ๊ฐ์ฒด์ž„์„ ํ™•์ธ

print(y)
# Name        Alice
# Age            25
# Location  Seattle
# Name: A, dtype: object
  • ์ฐธ๊ณ : ๋ฌธ์ž์—ด(String) ๋ฐ์ดํ„ฐ๋Š” Pandas์—์„œ object ํƒ€์ž…์œผ๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.
  • ๋ฐ์ดํ„ฐ ์ ‘๊ทผ:
    • ์—ด(Column) ์„ ํƒ: df['Name'] (์ด๋ฆ„์œผ๋กœ ์ ‘๊ทผ)
    • ํ–‰(Row) ์„ ํƒ: df.iloc[0] (์ˆœ์„œ๋กœ ์ ‘๊ทผ) ๋˜๋Š” df.loc['A'] (๋ผ๋ฒจ๋กœ ์ ‘๊ทผ)

3.3 Series ์ƒ์„ฑ ์ƒ์„ธ

pd.Series(data, index, dtype, name, copy)

  • data: ๋ฐฐ์—ด(Array), ๋ฐ˜๋ณต ๊ฐ€๋Šฅํ•œ ๊ฐ์ฒด(Iterable), ๋”•์…”๋„ˆ๋ฆฌ(Dict), ์Šค์นผ๋ผ(Scalar) ๊ฐ’ ๋“ฑ.
  • index: ๊ฐ ๋ฐ์ดํ„ฐ์— ๋Œ€์‘ํ•˜๋Š” ๋ ˆ์ด๋ธ”. ์ง€์ •ํ•˜์ง€ ์•Š์œผ๋ฉด ๊ธฐ๋ณธ๊ฐ’์ธ RangeIndex (0, 1, 2...)๊ฐ€ ์ƒ์„ฑ๋จ.
  • dtype: ๋ฐ์ดํ„ฐ ํƒ€์ž… ์ง€์ • (์˜ˆ: 'int32', 'float64'). ์ƒ๋žต ์‹œ ์ž๋™ ์ถ”๋ก .
  • name: Series์˜ ์ด๋ฆ„ ์ง€์ •.
  • copy: ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ๋ณต์‚ฌ ์—ฌ๋ถ€ ์„ค์ •.

Array๋กœ ์ƒ์„ฑํ•˜๊ธฐ

# ๊ธฐ๋ณธ ์ƒ์„ฑ (์ธ๋ฑ์Šค ์ž๋™ ์ƒ์„ฑ)
pd.Series([1, 2, 3, 4])

# ์˜ต์…˜ ์ง€์ • ์ƒ์„ฑ
pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'], dtype='int32', name='foo')

Dict๋กœ ์ƒ์„ฑํ•˜๊ธฐ

๋”•์…”๋„ˆ๋ฆฌ์˜ Key๊ฐ€ Series์˜ Index๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

d = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
pd.Series(d)
# a    1
# b    2
# ...

๋งŒ์•ฝ ๋”•์…”๋„ˆ๋ฆฌ๋กœ ์ƒ์„ฑํ•˜๋ฉด์„œ index๋ฅผ ๋”ฐ๋กœ ์ง€์ •ํ•˜๋ฉด, ์ง€์ •ํ•œ ์ธ๋ฑ์Šค์— ํ•ด๋‹นํ•˜๋Š” ๊ฐ’๋งŒ ๊ฐ€์ ธ์˜ค๊ฑฐ๋‚˜ ์—†์œผ๋ฉด NaN(๊ฒฐ์ธก์น˜)์œผ๋กœ ์ฑ„์›Œ์ง‘๋‹ˆ๋‹ค.


3.4 Series์˜ ์†์„ฑ ๋ณ€๊ฒฝ (Copy & Index)

  • Copy ํŒŒ๋ผ๋ฏธํ„ฐ: ์›๋ณธ ๋ฐ์ดํ„ฐ์™€์˜ ์—ฐ๊ฒฐ ๊ด€๊ณ„๋ฅผ ๋Š๊ณ  ์ƒˆ๋กœ์šด ๋ณต์‚ฌ๋ณธ์„ ๋งŒ๋“ค ๋•Œ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • copy=False (๊ธฐ๋ณธ๊ฐ’): ์›๋ณธ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ณ€๊ฒฝ๋˜๋ฉด Series๋„ ๋ณ€๊ฒฝ๋  ์ˆ˜ ์žˆ์Œ (View).
    • copy=True: ๋ณ„๋„์˜ ๋ฉ”๋ชจ๋ฆฌ์— ๋ณต์‚ฌํ•˜์—ฌ ์›๋ณธ๊ณผ ๋…๋ฆฝ์ ์ž„.
  • Index ๋ณ€๊ฒฝ: ์ƒ์„ฑ ํ›„์—๋„ s.index = [...]๋ฅผ ํ†ตํ•ด ์ธ๋ฑ์Šค๋ฅผ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
import numpy as np
import pandas as pd

# ์›๋ณธ ๋ฐ์ดํ„ฐ ๊ณต์œ  ์˜ˆ์‹œ
na = np.array([1, 2, 3])
s = pd.Series(na)
s.iloc[0] = 99
print(na)  # [99  2  3] -> ์›๋ณธ ๋ฐฐ์—ด๋„ ๋ณ€๊ฒฝ๋จ!

# Copy ์‚ฌ์šฉ ์˜ˆ์‹œ
na = np.array([1, 2, 3])
s = pd.Series(na, copy=True)
s.iloc[0] = 100
print(na)  # [1 2 3] -> ์›๋ณธ ๋ฐฐ์—ด ์œ ์ง€๋จ

3.5 Series ์ธ๋ฑ์‹ฑ ๋ฐ ์„ ํƒ (Indexing / Selection)

Pandas๋Š” ๋‹ค์–‘ํ•œ ๋ฐฉ์‹์˜ ๋ฐ์ดํ„ฐ ์„ ํƒ ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

์ž‘์—…๋ฌธ๋ฒ• ์˜ˆ์‹œ๊ฒฐ๊ณผ ํƒ€์ž…
Index(์ˆœ์„œ)๋กœ ์„ ํƒs.iloc[0]๊ฐ’ (Scalar)
Index๋กœ ์Šฌ๋ผ์ด์‹ฑs.iloc[5:10]Series
Index ๋ฐฐ์—ด๋กœ ์„ ํƒs.iloc[[2, 3]]Series
Label(์ด๋ฆ„)๋กœ ์„ ํƒs["a"] ๋˜๋Š” s.loc["a"]๊ฐ’ (Scalar)
Label๋กœ ์Šฌ๋ผ์ด์‹ฑs["b":"d"]Series (๋ ํฌํ•จ)
์กฐ๊ฑด(Boolean) ์„ ํƒs[s > 0]Series
Dict ์Šคํƒ€์ผ ํ™•์ธ"b" in s ๋˜๋Š” s.get("c")Bool
s.index = ['a', 'b', 'c']

print("b" in s) # True
print(s.get("c"))  # 3

๐Ÿผ ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ Python: Pandas ๊ธฐ์ดˆ (Part 2)

1. ๋ฒกํ„ฐํ™” ์—ฐ์‚ฐ (Vectorization) ๋ฐ ๋ ˆ์ด๋ธ” ์ •๋ ฌ

Pandas๋Š” NumPy์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋ฐ˜๋ณต๋ฌธ ์—†์ด ๋ฐ์ดํ„ฐ๋ฅผ ์ผ๊ด„ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฒกํ„ฐํ™” ์—ฐ์‚ฐ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์—ฐ์‚ฐ ์‹œ ์ธ๋ฑ์Šค(Label)๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ž๋™์œผ๋กœ ์ •๋ ฌ(Alignment)ํ•˜์—ฌ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

1.1 Vectorization & Label Alignment

  • Vectorization: s + s ๋˜๋Š” s * 2์™€ ๊ฐ™์ด ์ž‘์„ฑํ•˜๋ฉด Series์˜ ๋ชจ๋“  ์š”์†Œ์— ๋Œ€ํ•ด ์—ฐ์‚ฐ์ด ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค.
  • Label Alignment: ๋‘ Series ๊ฐ„์˜ ์—ฐ์‚ฐ ์‹œ, ๊ฐ™์€ ์ธ๋ฑ์Šค(Label)๋ฅผ ๊ฐ€์ง„ ๊ฐ’๋ผ๋ฆฌ ์—ฐ์‚ฐ๋ฉ๋‹ˆ๋‹ค.
    • ํ•œ์ชฝ์—๋งŒ ์กด์žฌํ•˜๋Š” ์ธ๋ฑ์Šค์˜ ๊ฒฝ์šฐ ๊ฒฐ๊ณผ๋Š” NaN(๊ฒฐ์ธก์น˜)์ด ๋ฉ๋‹ˆ๋‹ค.
    • ์ˆœ์„œ๊ฐ€ ๋‹ฌ๋ผ๋„ ์ธ๋ฑ์Šค๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋งค์นญ๋ฉ๋‹ˆ๋‹ค.
# s.iloc[1:] (๋‘ ๋ฒˆ์งธ๋ถ€ํ„ฐ ๋๊นŒ์ง€), s.iloc[:-1] (์ฒ˜์Œ๋ถ€ํ„ฐ ๋’ค์—์„œ ๋‘ ๋ฒˆ์งธ๊นŒ์ง€)
s.iloc[1:] + s.iloc[:-1] 

# ์ธ๋ฑ์Šค๊ฐ€ ๊ฒน์น˜๋Š” ๋ถ€๋ถ„๋งŒ ๊ณ„์‚ฐ๋˜๊ณ , ๊ฒน์น˜์ง€ ์•Š๋Š” ๋ถ€๋ถ„์€ NaN ์ฒ˜๋ฆฌ๋จ.
# a    NaN
# b    4.0
# c    NaN
# dtype: float64

2. DataFrame ์‹ฌํ™”

2.1 DataFrame ์ƒ์„ฑ (Dict input)

  • ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ DataFrame์„ ์ƒ์„ฑํ•  ๋•Œ, Key๋Š” ์ปฌ๋Ÿผ๋ช…์ด ๋˜๊ณ  Value(๋ฆฌ์ŠคํŠธ)๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
  • ์ฃผ์˜: ๋ชจ๋“  ๋ฆฌ์ŠคํŠธ์˜ ๊ธธ์ด๋Š” ๊ฐ™์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ธธ์ด๊ฐ€ ๋‹ค๋ฅด๋ฉด ValueError๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
  • Index ์ง€์ •: index ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ํ–‰ ๋ ˆ์ด๋ธ”์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Missing Data: columns๋ฅผ ์ง€์ •ํ•  ๋•Œ ๋ฐ์ดํ„ฐ์— ์—†๋Š” ์ปฌ๋Ÿผ๋ช…์„ ๋„ฃ์œผ๋ฉด ํ•ด๋‹น ์ปฌ๋Ÿผ์€ NaN์œผ๋กœ ์ฑ„์›Œ์ง‘๋‹ˆ๋‹ค.
d = {
 "one": pd.Series([1.0, 2.0, 3.0], index=["a", "b", "c"]),
 "two": pd.Series([1.0, 2.0, 3.0, 4.0], index=["a", "b", "c", "d"]),
}
# Series๋กœ ๊ตฌ์„ฑ๋œ Dict๋Š” ์ธ๋ฑ์Šค๊ฐ€ ๋‹ฌ๋ผ๋„ ํ•ฉ์ง‘ํ•ฉ(Union)์œผ๋กœ ์ธ๋ฑ์Šค๊ฐ€ ์ƒ์„ฑ๋˜๊ณ ,
# ๋น„์–ด์žˆ๋Š” ๊ณณ์€ NaN์œผ๋กœ ์ฑ„์›Œ์ง.
pd.DataFrame(d)
#    one  two
# a  1.0  1.0
# b  2.0  2.0
# c  3.0  3.0
# d  NaN  4.0

2.2 List of Dicts ์ƒ์„ฑ

๋”•์…”๋„ˆ๋ฆฌ์˜ ๋ฆฌ์ŠคํŠธ๋กœ ์ƒ์„ฑํ•  ๊ฒฝ์šฐ, ๊ฐ ๋”•์…”๋„ˆ๋ฆฌ๊ฐ€ ํ•˜๋‚˜์˜ ํ–‰(Row)์ด ๋ฉ๋‹ˆ๋‹ค.

data = [{"a": 1, "b": 2}, {"a": 5, "b": 10, "c": 20}]
pd.DataFrame(data)
#    a   b     c
# 0  1   2   NaN
# 1  5  10  20.0

3. DataFrame ์กฐ์ž‘ (Column & Indexing)

3.1 ์ปฌ๋Ÿผ ์„ ํƒ, ์ถ”๊ฐ€, ์‚ญ์ œ

DataFrame์€ ๋”•์…”๋„ˆ๋ฆฌ์ฒ˜๋Ÿผ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์„ ํƒ: df['one'] (Series ๋ฐ˜ํ™˜)
  • ์ถ”๊ฐ€: df['three'] = df['one'] * 2 (์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ ์ƒ์„ฑ)
  • ์‚ญ์ œ: del df['two'] ๋˜๋Š” df.pop('three')
  • ์‚ฝ์ž…: df.insert(1, 'new_col', value) (ํŠน์ • ์œ„์น˜์— ์ปฌ๋Ÿผ ์ถ”๊ฐ€)

3.2 ์ธ๋ฑ์‹ฑ ๋ฐ ์„ ํƒ (Indexing / Selection)

์ž‘์—…๋ฌธ๋ฒ•๊ฒฐ๊ณผ ํƒ€์ž…
์—ด ์„ ํƒdf[col]Series
ํ–‰ ์„ ํƒ (Label)df.loc[label]Series
ํ–‰ ์„ ํƒ (Index)df.iloc[loc]Series
ํ–‰ ์Šฌ๋ผ์ด์‹ฑdf[5:10] ๋˜๋Š” df.iloc[5:10]DataFrame
์กฐ๊ฑด๋ถ€ ์„ ํƒdf[bool_vec]DataFrame

4. ์ž…์ถœ๋ ฅ ๋„๊ตฌ (IO Tools)

Pandas๋Š” ๋‹ค์–‘ํ•œ ํŒŒ์ผ ํฌ๋งท์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

ํฌ๋งทReader (์ฝ๊ธฐ)Writer (์“ฐ๊ธฐ)
CSVread_csvto_csv
JSONread_jsonto_json
Excelread_excelto_excel
SQLread_sqlto_sql

4.1 read_csv ์ฃผ์š” ํŒŒ๋ผ๋ฏธํ„ฐ

  • filepath_or_buffer: ํŒŒ์ผ ๊ฒฝ๋กœ
  • sep: ๊ตฌ๋ถ„์ž (๊ธฐ๋ณธ๊ฐ’ ,)
  • header: ํ—ค๋” ์œ„์น˜ (๊ธฐ๋ณธ๊ฐ’ 0, ์—†์œผ๋ฉด None)
  • index_col: ์ธ๋ฑ์Šค๋กœ ์‚ฌ์šฉํ•  ์ปฌ๋Ÿผ
  • usecols: ๋ถˆ๋Ÿฌ์˜ฌ ์ปฌ๋Ÿผ ์ง€์ •
  • encoding: ์ธ์ฝ”๋”ฉ (์˜ˆ: 'utf-8', 'cp949')
  • skiprows: ๊ฑด๋„ˆ๋›ธ ํ–‰ ๊ฐœ์ˆ˜

4.2 ์‹ค์Šต: read_csv์™€ ์ธ์ฝ”๋”ฉ ๋ฌธ์ œ ํ•ด๊ฒฐ

๊ณต๊ณต๋ฐ์ดํ„ฐ(์˜ˆ: ์„œ์šธ์—ด๋ฆฐ๋ฐ์ดํ„ฐ๊ด‘์žฅ)์—์„œ ๋‹ค์šด๋กœ๋“œํ•œ CSV ํŒŒ์ผ์€ ํ•œ๊ธ€ ์ธ์ฝ”๋”ฉ ๋ฌธ์ œ๋กœ ์ธํ•ด ๊ธฐ๋ณธ ์„ค์ •์œผ๋กœ ์ฝ์„ ๋•Œ ์˜ค๋ฅ˜๊ฐ€ ์ž์ฃผ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๊ณผ์ •์„ ์‚ดํŽด๋ด…๋‹ˆ๋‹ค.

  • ๋ฐ์ดํ„ฐ์…‹: ์„œ์šธํŠน๋ณ„์‹œ ๊ณต๊ณต์ž์ „๊ฑฐ(๋”ฐ๋ฆ‰์ด) ๋Œ€์—ฌ์ด๋ ฅ ์ •๋ณด (2025๋…„ 2์›” ์ž๋ฃŒ)

1) ๋ฌธ์ œ ๋ฐœ์ƒ: ๊ธฐ๋ณธ ์„ค์ •์œผ๋กœ ์ฝ๊ธฐ (UTF-8 ์˜ค๋ฅ˜)

Pandas์˜ read_csv๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ encoding='utf-8'์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋งŽ์€ ํ•œ๊ตญ ๊ณต๊ณต๋ฐ์ดํ„ฐ๋Š” cp949๋‚˜ euc-kr๋กœ ์ธ์ฝ”๋”ฉ๋˜์–ด ์žˆ์–ด ์•„๋ž˜์™€ ๊ฐ™์€ ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

import pandas as pd

# ๊ธฐ๋ณธ ์„ค์ •์œผ๋กœ ๋กœ๋“œ ์‹œ๋„
bike = pd.read_csv('seoul_bike_2502.csv')
print(bike.head())

๋ฐœ์ƒ ์—๋Ÿฌ (Traceback):

Traceback (most recent call last):
  ...
  File "pandas/_libs/parsers.pyx", line 2053, in pandas._libs.parsers.raise_parser_error
  File "<frozen codecs>", line 325, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 1: invalid start byte

์›์ธ: ํŒŒ์ผ์€ cp949๋กœ ์ €์žฅ๋˜์–ด ์žˆ๋Š”๋ฐ, Pandas๋Š” utf-8 ๋ฐฉ์‹์œผ๋กœ ํ•ด์„ํ•˜๋ ค๋‹ค ์‹คํŒจํ•จ.

2) ๋ฌธ์ œ ํ•ด๊ฒฐ: ์ธ์ฝ”๋”ฉ ์˜ต์…˜ ์ง€์ • (encoding='cp949')

encoding ํŒŒ๋ผ๋ฏธํ„ฐ์— 'cp949' (๋˜๋Š” 'euc-kr')์„ ์ง€์ •ํ•˜์—ฌ ํŒŒ์ผ์„ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ฝ์–ด์˜ต๋‹ˆ๋‹ค.

import pandas as pd

# encoding='cp949' ์ถ”๊ฐ€ํ•˜์—ฌ ํ•œ๊ธ€ ๊นจ์ง ๋ฐฉ์ง€
bike = pd.read_csv('seoul_bike_2502.csv', encoding='cp949')

# ์ƒ์œ„ 5๊ฐœ ํ–‰ ์ถœ๋ ฅ
print(bike.head())

์‹คํ–‰ ๊ฒฐ๊ณผ:

์ž์ „๊ฑฐ๋ฒˆํ˜ธ                 ๋Œ€์—ฌ์ผ์‹œ ๋Œ€์—ฌ ๋Œ€์—ฌ์†Œ๋ฒˆํ˜ธ         ๋Œ€์—ฌ ๋Œ€์—ฌ์†Œ๋ช…  ... ์ด์šฉ์ž์ข…๋ฅ˜  ๋Œ€์—ฌ๋Œ€์—ฌ์†ŒID  ๋ฐ˜๋‚ฉ๋Œ€์—ฌ์†ŒID    ์ž์ „๊ฑฐ๊ตฌ๋ถ„
0  SPB-41846  2025-02-01 00:01:04    01308   ์•ˆ์•”๋กœํ„ฐ๋ฆฌ ๋ฒ„์Šค์ •๋ฅ˜์žฅ ์•ž  ...   ๋‚ด๊ตญ์ธ   ST-827   ST-273  BIK_002
1  SPB-60204  2025-02-01 00:00:14    03500         ๊ตฐ์ž์—ญ2๋ฒˆ์ถœ๊ตฌ  ...   ๋‚ด๊ตญ์ธ   ST-983  ST-1266  BIK_002
2  SPB-60407  2025-02-01 00:01:54    00398     ์„์ง€๋กœ3๊ฐ€์—ญ 3๋ฒˆ์ถœ๊ตฌ  ...   ๋‚ด๊ตญ์ธ  ST-1435   ST-943  BIK_002
3         \\N  2025-02-01 00:01:34    00864  ์ˆœ์ฒœํ–ฅ๋Œ€ํ•™๋ณ‘์›(ํ•œ๋‚จ์˜ค๊ฑฐ๋ฆฌ)  ...   ๋‚ด๊ตญ์ธ  ST-2188  ST-2188  BIK_002
4  SPB-50025  2025-02-01 00:00:30    00558    ์„ฑ๋™๊ด‘์ง„ ๊ต์œก์ง€์›์ฒญ ์•ž  ...   ๋‚ด๊ตญ์ธ   ST-359  ST-2340  BIK_002

[5 rows x 17 columns]

4.3 read_sql_table ์ฃผ์š” ํŒŒ๋ผ๋ฏธํ„ฐ

  • table_name: ํ…Œ์ด๋ธ” ์ด๋ฆ„
  • con: SQLAlchemy ์—ฐ๊ฒฐ ๊ฐ์ฒด
  • index_col: ์ธ๋ฑ์Šค๋กœ ์„ค์ •ํ•  ์ปฌ๋Ÿผ
  • parse_dates: ๋‚ ์งœ ํ˜•์‹์œผ๋กœ ํŒŒ์‹ฑํ•  ์ปฌ๋Ÿผ

5. ๋ฐ์ดํ„ฐ ํƒ์ƒ‰ ๋ฐ ์ „์ฒ˜๋ฆฌ

5.1 ๋ฐ์ดํ„ฐ ํƒ์ƒ‰ (Exploration)

  • head(n) / tail(n): ์ƒ์œ„/ํ•˜์œ„ n๊ฐœ์˜ ํ–‰์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. (๊ธฐ๋ณธ๊ฐ’ 5)
  • sample(n): ๋ฌด์ž‘์œ„๋กœ n๊ฐœ์˜ ํ–‰์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • describe(): ์ˆ˜์น˜ํ˜• ๋ฐ์ดํ„ฐ์˜ ๊ธฐ์ˆ  ํ†ต๊ณ„๋Ÿ‰(ํ‰๊ท , ํ‘œ์ค€ํŽธ์ฐจ, 4๋ถ„์œ„์ˆ˜ ๋“ฑ)์„ ์š”์•ฝํ•ด์„œ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
    • include='all': ๋ฌธ์ž์—ด ๋ฐ์ดํ„ฐ๊นŒ์ง€ ํฌํ•จํ•˜์—ฌ ์š”์•ฝ.
import pandas as pd

# ์„œ์šธ์‹œ ๋”ฐ๋ฆ‰์ด ๋Œ€์—ฌ ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
bike = pd.read_csv("seoul_bike_2502.csv", encoding='cp949', parse_dates=['๋Œ€์—ฌ์ผ์‹œ', '๋ฐ˜๋‚ฉ์ผ์‹œ'], date_format='%Y-%m-%d %H:%M:%S')
#print(bike.head())
print(bike.describe())

์‹คํ–‰ ๊ฒฐ๊ณผ:

๋Œ€์—ฌ์ผ์‹œ                           ๋ฐ˜๋‚ฉ์ผ์‹œ       ์ด์šฉ์‹œ๊ฐ„(๋ถ„)       ์ด์šฉ๊ฑฐ๋ฆฌ(M)
count                        1629540                        1629540  1.629540e+06  1.629540e+06   
mean   2025-02-16 16:46:03.578277632  2025-02-16 17:05:14.998458368  1.844827e+01  1.916543e+03   
min              2025-02-01 00:00:11            2025-02-01 00:03:54  0.000000e+00  0.000000e+00   
25%    2025-02-10 17:14:23.750000128            2025-02-10 17:35:26  5.000000e+00  7.485000e+02   
50%       2025-02-17 08:46:29.500000            2025-02-17 08:56:11  9.000000e+00  1.236465e+03   
75%    2025-02-24 06:06:18.750000128  2025-02-24 06:18:17.249999872  2.000000e+01  2.170000e+03   
max              2025-02-28 23:59:57            2025-03-01 02:01:43  1.097000e+03  8.861735e+04   
std                              NaN                            NaN  2.465302e+01  2.299137e+03   

5.2 ๊ฒฐ์ธก๊ฐ’ ์ฒ˜๋ฆฌ (Missing Data Handling)

๋ฐ์ดํ„ฐ ๋ถ„์„์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์ „์ฒ˜๋ฆฌ ๊ณผ์ • ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค.

์ž‘์—…ํ•จ์ˆ˜์„ค๋ช…์˜ˆ์‹œ
ํ™•์ธisna(), notna() ,isnull(), notnull()๊ฒฐ์ธก์น˜ ์—ฌ๋ถ€๋ฅผ Boolean์œผ๋กœ ๋ฐ˜ํ™˜ (df.isnull()๊ณผ ๋™์ผ)df.isnull()
์ œ๊ฑฐdropna()๊ฒฐ์ธก์น˜๊ฐ€ ํฌํ•จ๋œ ํ–‰/์—ด์„ ์‚ญ์ œdf.dropna(thresh=2)
์ฑ„์šฐ๊ธฐfillna(value)๊ฒฐ์ธก์น˜๋ฅผ ํŠน์ • ๊ฐ’์œผ๋กœ ์ฑ„์›€df.fillna(0)
์‹œ๊ณ„์—ดffill(), bfill()์•ž(Forward) ๋˜๋Š” ๋’ค(Backward)์˜ ๊ฐ’์œผ๋กœ ์ฑ„์›€df.ffill()

๐Ÿ”‘ [์ฐธ๊ณ ] Hashable (ํ•ด์‹œ ๊ฐ€๋Šฅ) ์ด๋ž€?

1. ์ •์˜

"์–ด๋–ค ๊ฐ์ฒด(Object)๊ฐ€ ํ‰์ƒ ๋™์•ˆ ๋ณ€ํ•˜์ง€ ์•Š๋Š” ๊ณ ์œ ํ•œ ๊ฐ’(Hash Value)์„ ๊ฐ€์ง€๊ณ  ์žˆ๊ณ , ๋‹ค๋ฅธ ๊ฐ์ฒด์™€ ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด ๊ทธ ๊ฐ์ฒด๋Š” Hashableํ•˜๋‹ค." ๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์‰ฝ๊ฒŒ ๋งํ•ด, "์ฃผ๋ฏผ๋“ฑ๋ก๋ฒˆํ˜ธ๋‚˜ ์ง€๋ฌธ์ฒ˜๋Ÿผ ๋ณ€ํ•˜์ง€ ์•Š๋Š” ๊ณ ์œ  ID๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋Š”๊ฐ€?" ์˜ ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค.

2. Hashable์˜ ์กฐ๊ฑด (Python)

ํŒŒ์ด์ฌ์—์„œ ์–ด๋–ค ๊ฐ์ฒด๊ฐ€ Hashable ํ•˜๋ ค๋ฉด ๋‹ค์Œ ๋‘ ๊ฐ€์ง€ ๋ฉ”์„œ๋“œ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

  1. __hash__(): ๊ฐ์ฒด์˜ ๊ณ ์œ ํ•œ ์ •์ˆ˜(Integer) ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฐ’์€ ๊ฐ์ฒด๊ฐ€ ์‚ด์•„์žˆ๋Š” ๋™์•ˆ ๋ณ€ํ•˜๋ฉด ์•ˆ ๋ฉ๋‹ˆ๋‹ค.
  2. __eq__(): ๋‹ค๋ฅธ ๊ฐ์ฒด์™€ ๊ฐ™์€์ง€ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. (Equality)

3. Mutable vs Immutable (๊ฐ€๋ณ€ vs ๋ถˆ๋ณ€)

์ด๊ฒƒ์ด Hashable์„ ๊ฐ€๋ฅด๋Š” ๊ฐ€์žฅ ํฐ ๊ธฐ์ค€์ž…๋‹ˆ๋‹ค.

๊ตฌ๋ถ„์„ค๋ช…์˜ˆ์‹œ (Python)Hashable ์—ฌ๋ถ€
Immutable
(๋ถˆ๋ณ€)
์ƒ์„ฑ๋œ ํ›„ ๊ฐ’์„ ๋ฐ”๊ฟ€ ์ˆ˜ ์—†๋Š” ๊ฐ์ฒดint, float, str, bool, tupleโœ… Yes (๊ฐ€๋Šฅ)
Mutable
(๊ฐ€๋ณ€)
์ƒ์„ฑ๋œ ํ›„ ๊ฐ’์„ ๋ฐ”๊ฟ€ ์ˆ˜ ์žˆ๋Š” ๊ฐ์ฒดlist, dict, setโŒ No (๋ถˆ๊ฐ€๋Šฅ)

์™œ ๊ฐ€๋ณ€ ๊ฐ์ฒด(List ๋“ฑ)๋Š” Hashable์ด ์•„๋‹๊นŒ์š”?
๋ฆฌ์ŠคํŠธ [1, 2]๊ฐ€ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด ๋ด…์‹œ๋‹ค. ์ด ๋ฆฌ์ŠคํŠธ์˜ ๋‚ด์šฉ๋ฌผ์€ ์–ธ์ œ๋“  [1, 2, 3]์œผ๋กœ ๋ฐ”๋€” ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚ด์šฉ์ด ๋ฐ”๋€Œ๋ฉด '๊ณ ์œ ํ•œ ๊ฐ’(Hash)'๋„ ๋ฐ”๋€Œ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. "๊ฐ’์ด ์ˆ˜์‹œ๋กœ ๋ณ€ํ•˜๋Š” ๋…€์„์€ ๊ณ ์œ ํ•œ ์ฃผ๋ฏผ๋ฒˆํ˜ธ๋ฅผ ํ™•์ • ์ง€์„ ์ˆ˜ ์—†๋‹ค"๋ผ๊ณ  ์ดํ•ดํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.


4. ์ฝ”๋“œ ์˜ˆ์‹œ

โœ… Hashableํ•œ ๊ฒฝ์šฐ (์ •์ˆ˜, ๋ฌธ์ž์—ด, ํŠœํ”Œ)

# ์ •์ˆ˜, ๋ฌธ์ž์—ด์€ ํ•ด์‹œ ๊ฐ€๋Šฅ
print(hash(123))        # ์˜ˆ: 123
print(hash("Python"))   # ์˜ˆ: -6401083652872391083

# ํŠœํ”Œ(Tuple)์€ ๋ถˆ๋ณ€์ด๋ฏ€๋กœ ํ•ด์‹œ ๊ฐ€๋Šฅ
my_tuple = (1, 2, 3)
print(hash(my_tuple))   # ํ•ด์‹œ๊ฐ’ ์ƒ์„ฑ ์„ฑ๊ณต

โŒ Not Hashableํ•œ ๊ฒฝ์šฐ (๋ฆฌ์ŠคํŠธ, ๋”•์…”๋„ˆ๋ฆฌ)

# ๋ฆฌ์ŠคํŠธ(List)๋Š” ๊ฐ€๋ณ€์ด๋ฏ€๋กœ ํ•ด์‹œ ๋ถˆ๊ฐ€๋Šฅ
my_list = [1, 2, 3]

try:
    print(hash(my_list))
except TypeError as e:
    print(e)  # "unhashable type: 'list'" ์—๋Ÿฌ ๋ฐœ์ƒ

โš ๏ธ ์ฃผ์˜ํ•  ์ : ํŠœํ”Œ ์•ˆ์— ๋ฆฌ์ŠคํŠธ๊ฐ€ ์žˆ๋‹ค๋ฉด?

ํŠœํ”Œ์€ ๋ถˆ๋ณ€์ด์ง€๋งŒ, ํŠœํ”Œ ์•ˆ์— ๊ฐ€๋ณ€ ๊ฐ์ฒด(๋ฆฌ์ŠคํŠธ ๋“ฑ)๊ฐ€ ๋“ค์–ด์žˆ์œผ๋ฉด ๊ทธ ํŠœํ”Œ์€ Hashable ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

bad_tuple = (1, 2, [3, 4])
# ํŠœํ”Œ ์ž์ฒด๋Š” ๋ถˆ๋ณ€์ด์ง€๋งŒ, ์•ˆ์˜ ๋ฆฌ์ŠคํŠธ [3, 4]๊ฐ€ ๋ณ€ํ•  ์ˆ˜ ์žˆ์Œ -> unhashable
# hash(bad_tuple) -> Error!

5. Pandas์™€ Hashable

๊ฐ•์˜ ๋‚ด์šฉ์—์„œ ์ด ๊ฐœ๋…์ด ์ค‘์š”ํ•˜๊ฒŒ ๋‹ค๋ค„์ง„ ์ด์œ ๋Š” Pandas์˜ ๋™์ž‘ ์›๋ฆฌ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

5.1 ์ธ๋ฑ์Šค(Index)์™€ ์ปฌ๋Ÿผ(Column)

Pandas์˜ Index๋‚˜ Column์˜ ์ด๋ฆ„์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋น ๋ฅด๊ฒŒ ์ฐพ๊ธฐ ์œ„ํ•œ ์—ด์‡ (Key) ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

  • ๋ฐ์ดํ„ฐ๋ฅผ ๋น ๋ฅด๊ฒŒ ์ฐพ์œผ๋ ค๋ฉด ๋‚ด๋ถ€์ ์œผ๋กœ ํ•ด์‹œ ํ…Œ์ด๋ธ”(Hash Table) ์ด๋ผ๋Š” ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ๋”ฐ๋ผ์„œ ์ธ๋ฑ์Šค๋‚˜ ์ปฌ๋Ÿผ๋ช…์œผ๋กœ ์‚ฌ์šฉํ•  ๊ฐ์ฒด๋Š” ๋ฐ˜๋“œ์‹œ Hashable ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
    • ๊ฐ€๋Šฅ: 'Name', 2023, (2023, 'Jan') ๋“ฑ
    • ๋ถˆ๊ฐ€๋Šฅ: ['Name', 'Age'] (๋ฆฌ์ŠคํŠธ ์ž์ฒด๋ฅผ ํ•˜๋‚˜์˜ ์ธ๋ฑ์Šค ์ด๋ฆ„์œผ๋กœ ์“ธ ์ˆ˜ ์—†์Œ)

5.2 ๋”•์…”๋„ˆ๋ฆฌ(Dict)์˜ Key

Pandas DataFrame์ด๋‚˜ Series๋ฅผ ์ƒ์„ฑํ•  ๋•Œ ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ๋งŽ์ด ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ํŒŒ์ด์ฌ์˜ ๋”•์…”๋„ˆ๋ฆฌ Key๋Š” ๋ฐ˜๋“œ์‹œ Hashable์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

# ๊ฐ€๋Šฅ (String Key)
d = {"a": 1, "b": 2}

# ๋ถˆ๊ฐ€๋Šฅ (List Key) -> Error ๋ฐœ์ƒ
# d = {[1, 2]: "value"}

์š”์•ฝ

  1. Hashable: ๋ณ€ํ•˜์ง€ ์•Š๋Š” ๊ณ ์œ ๊ฐ’์„ ๊ฐ€์ง (์ฃผ๋ฏผ๋ฒˆํ˜ธ O).
  2. Immutable(๋ถˆ๋ณ€) ๊ฐ์ฒด(str, int, tuple)๋Š” Hashable์ด๋‹ค.
  3. Mutable(๊ฐ€๋ณ€) ๊ฐ์ฒด(list, dict)๋Š” Not Hashable์ด๋‹ค.
  4. Pandas์˜ Index, Column๋ช…, Dict์˜ Key๋Š” ๋ฐ˜๋“œ์‹œ Hashable์ด์–ด์•ผ ํ•œ๋‹ค.

๐Ÿ”— ์ฐธ๊ณ  ์ž๋ฃŒ (References)

0๊ฐœ์˜ ๋Œ“๊ธ€