แ„‚ ๐Ÿ˜„ [12 ์ผ์ฐจ] : DS Flipped 01. EDA ๊ธฐ์ดˆ

๋ฐฑ๊ฑดยท2022๋…„ 1์›” 21์ผ
0

EDA์˜ ๊ธฐ์ดˆ

EDA๋Š” ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„์—์„œ ์ •์ œ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ›์•„ ๋ถ„์„ํ•˜๊ณ  ํ”ผ์ฒ˜ ์—”์ง€๋‹ˆ์–ด๋ง(feature enginnering)์œผ๋กœ ๋„˜๊ฒจ ์ฃผ๋Š” ์—ญํ• ์ž…๋‹ˆ๋‹ค๋งŒ ์ „์ฒ˜๋ฆฌ์˜ ์ผ๋ถ€๋Š” EDA๋ฅผ ์ง„ํ–‰ํ•ด์•ผ ํ•  ์ˆ˜ ์žˆ๊ณ  ํ”ผ์ฒ˜ ์—”์ง€๋‹ˆ์–ด๋ง ์—ญ์‹œ EDA์™€ ํ˜ผ์žฌ๋˜์–ด์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‹ค์€ ๋ช…ํ™•ํ•˜๊ฒŒ ๊ตฌ๋ถ„๋˜๋Š” ๊ณผ์ •์€ ์•„๋‹™๋‹ˆ๋‹ค.

์ด๋ฒˆ ํ’€์žŽ์—์„œ ์ „์ฒ˜๋ฆฌ๋Š” ์ฃผ๋กœ ์ด์ƒ์น˜์™€ ๊ฒฐ์ธก๊ฐ’ ์ฒ˜๋ฆฌ ๋“ฑ์„ ์ฃผ๋กœ ์‚ดํŽด๋ณผ ์˜ˆ์ •์ด๋ฉฐ EDA์˜ ๊ธฐ๋ณธ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ์ดˆ ํ†ต๊ณ„๋Ÿ‰์„ ํ™•์ธํ•˜๋Š” ๊ฒƒ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ์˜ ํŠน์ง•(feature)๋ฅผ ์ฐพ์•„๋‚ด๋Š” ๊ณผ์ •๊นŒ์ง€ ํ•จ๊ป˜ ๊ณต๋ถ€ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ ๊ธฐ์ดˆ ํ†ต๊ณ„๋Ÿ‰์ด๋ž€ ๋ฐ์ดํ„ฐ์˜ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ ํŠน์ง•์ธ ํ‰๊ท , ๋ถ„์‚ฐ, ํŽธ์ฐจ ๋“ฑ์„ ์ผ์ปท์ง€๋งŒ ๊ฒฝ์šฐ์— ๋”ฐ๋ผ์„  ๋‹ค์„ฏ ์ˆ˜์น˜ ์š”์•ฝ, ์™œ๋„, ์ฒจ๋„ ๋“ฑ์„ ํฌํ•จํ•˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

๋ณธ ์ˆ˜์—…์—์„œ๋Š” Fundamental 9. ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๊ธฐ๋ฒ•์—์„œ ์†Œ๊ฐœ๋œ ์บ๊ธ€ Video Game Sales ๋ฐ์ดํ„ฐ์˜ ์ผ๋ถ€๋ฅผ ๋ณ€ํ˜•ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy.stats import norm
from sklearn.preprocessing import StandardScaler
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
df = pd.read_csv("./data/vgsales_lecture.csv")

Unit 1. ๋ฐ์ดํ„ฐ ์‚ดํŽด๋ณด๊ธฐ

์ด๋ฒˆ ํ’€์žŽ์—์„œ๋Š” ํŒ๋‹ค์Šค ์‹œ๋ฆฌ์ฆˆ / ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ์ด์šฉํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

ํŒ๋‹ค์Šค๋Š” ์—ฌ๋Ÿฌ๊ฐ€์ง€ ํ•จ์ˆ˜๋ฅผ ์ œ๊ณตํ•˜๋ฉฐ ๋ฐ์ดํ„ฐ ์กฐ์ž‘ ๋ฐ EDA๋ฅผ ๋น ๋ฅด๊ณ  ์†์‰ฝ๊ฒŒ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ฃผ๋Š” ๋„๊ตฌ์ž…๋‹ˆ๋‹ค.

์•„๋ž˜์„œ๋Š” ๋ช‡๊ฐ€์ง€ ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

๋ฐฉ๋ฒ• 1. df.info()

info ํ•จ์ˆ˜๋Š” ์ด ๋ฐ์ดํ„ฐ ์ˆ˜์™€ ๋ฐ์ดํ„ฐ ํƒ€์ž…์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

์•„๋ž˜์—์„œ ์กฐ๊ธˆ ๋” ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rank          16598 non-null  int64  
 1   Name          16598 non-null  object 
 2   Platform      16598 non-null  object 
 3   Year          16327 non-null  object 
 4   Genre         16598 non-null  object 
 5   Publisher     16335 non-null  object 
 6   NA_Sales      12099 non-null  float64
 7   EU_Sales      10870 non-null  float64
 8   JP_Sales      6141 non-null   float64
 9   Other_Sales   10123 non-null  float64
 10  Global_Sales  16596 non-null  float64
dtypes: float64(5), int64(1), object(5)
memory usage: 1.4+ MB

๋ฐฉ๋ฒ• 2. df.head()

์•ž์œผ๋กœ ์ฃผ๊ตฌ์žฅ์ฐฝ ๋ณด๊ฒŒ๋˜์‹ค head ํ•จ์ˆ˜๋Š” ๋ฐ์ดํ„ฐ๋Ÿ‰์ด ๋งŽ์„๋•Œ ์•ž์˜ ๋ช‡๊ฐœ๋งŒ ๊ณจ๋ผ์„œ ๋ณผ ์ˆ˜ ์žˆ๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.

๋ฐ˜๋Œ€๋กœ ๋’ค์—์„œ ๋ฝ‘์•„์˜ค๋Š” tail์ด๋ž€ ํ•จ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

df.head()
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
0 1 Wii Sports Wii 2006 Sports Nintendo 41.49 29.02 3.77 8.46 82.74
1 2 Super Mario Bros. NES 1985 Platform Nintendo 29.08 3.58 6.81 0.77 40.24
2 3 Mario Kart Wii Wii 2008 Racing Nintendo 15.85 12.88 3.79 3.31 35.82
3 4 Wii Sports Resort Wii 2009 Sports Nintendo 15.75 11.01 3.28 2.96 33.00
4 5 Pokemon Red/Pokemon Blue GB 1996 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37

๋ฐฉ๋ฒ• 3. series.unique()

unique ํ•จ์ˆ˜๋Š” ํ•จ์ˆ˜ ์ด๋ฆ„์ฒ˜๋Ÿผ ์ค‘๋ณต๊ฐ’์„ ์ œํ•˜๊ณ  ๋‚จ์€ ๊ฒƒ๋“ค๋งŒ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

์‹œ๋ฆฌ์ฆˆ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ํ•จ์ˆ˜๋กœ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—์„  nunique() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด์ค„ ์ˆ˜ ์žˆ์ง€๋งŒ ๊ฐฏ์ˆ˜๋งŒ์„ ์นด์šดํŠธํ•ด์ค๋‹ˆ๋‹ค.

df.nunique()
Rank            16598
Name            11493
Platform           33
Year               40
Genre              14
Publisher         577
NA_Sales          408
EU_Sales          304
JP_Sales          243
Other_Sales       156
Global_Sales      623
dtype: int64
df["Platform"].unique()
array(['Wii', 'NES', 'GB', 'DS', 'X360', 'PS3', 'PS2', 'SNES', 'GBA',
       '3DS', 'PS4', 'N64', 'PS', 'XB', 'PC', '2600', 'PSP', 'XOne', 'GC',
       'WiiU', 'GEN', 'DC', 'PSV', 'SAT', 'SCD', 'WS', 'NG', 'TG16',
       '2007', '3DO', 'GG', '2010', 'PCFX'], dtype=object)
# ์•„๋ž˜์™€ ๊ฐ™์€ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ์€ ๋‹น์—ฐํ•œ ์ผ์ž…๋‹ˆ๋‹ค.
# df.unique()

๋ฐฉ๋ฒ• 4. df.describe()

๋งˆ์ง€๋ง‰์œผ๋กœ describe๋Š” ์š”์•ฝ ํ†ต๊ณ„๋ฅผ ํ•œ ๋ฒˆ์— ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

describe ์—ญ์‹œ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์žˆ์ง€๋งŒ ํ•ด๋‹น ๋ถ€๋ถ„์€ ํŒ๋‹ค์Šค ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•ด์ฃผ์„ธ์š”.

df.describe()
Rank NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
count 16598.000000 12099.000000 10870.000000 6141.000000 10123.000000 16596.000000
mean 8300.605254 0.363084 0.223942 0.210210 0.078818 0.537498
std 4791.853933 0.937693 0.610455 0.480352 0.236416 1.555113
min 1.000000 0.010000 0.010000 0.010000 0.010000 0.010000
25% 4151.250000 0.060000 0.020000 0.030000 0.010000 0.060000
50% 8300.500000 0.140000 0.070000 0.070000 0.030000 0.170000
75% 12449.750000 0.350000 0.200000 0.190000 0.070000 0.470000
max 16600.000000 41.490000 29.020000 10.220000 10.570000 82.740000

์กฐ๋ณ„ํ•™์Šต 1

ํŒ๋‹ค์Šค ๋ฌธ์„œ์˜ ์ฃผ์†Œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

https://pandas.pydata.org/docs/reference/index.html

์šฐ์ธก ๋ชฉ๋ก์—์„œ dataframe ํ˜น์€ series์„ ์—ด์–ด๋ณด๋ฉด ์ˆ˜๋งŽ์€ ํ•จ์ˆ˜๋“ค์ด ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

โ—(์กฐ๋ณ„ํ•™์Šต)์ง€๊ธˆ๋ถ€ํ„ฐ๋Š” ์กฐ๋ณ„๋กœ ์œ„์˜ describe ํ•จ์ˆ˜ ๊ฒฐ๊ณผ๊ฐ’์„ ํ•˜๋‚˜์”ฉ ์–ด๋–ค ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด ๊ตฌํ˜„ํ•ด๋‚ผ ์ˆ˜ ์žˆ์„์ง€ ๊ณ ๋ฏผํ•ด๋ด…์‹œ๋‹ค.

Unit 2. ์ด์ƒ์น˜์™€ ๊ฒฐ์ธก์น˜ ํƒ์ƒ‰

๊ทธ๋Ÿผ ์ด๋ฒˆ์—” ์ด์ƒ์น˜๋ฅผ ํƒ์ƒ‰ํ•ด ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์ผ๋ฐ˜์ ์ธ ๊ฒฝ์šฐ ๊ฐ’์ด ๋งค์šฐ ํฌ๊ฑฐ๋‚˜ ๋งค์šฐ ์ž‘์€ ๊ฒฝ์šฐ๋ฅผ ์˜๋ฏธํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ทธ๋ž˜ํ”„๋ฅผ ํ†ตํ•ด, ํ˜น์€ z-score ๋“ฑ์„ ํ†ตํ•ด์„œ ์ฐพ์•„๋‚ด๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ ์ด์ง€๋งŒ ์—ฐ๋„์˜ ๊ฒฝ์šฐ ๊ฐ’์˜ ์ข…๋ฅ˜ ์ž์ฒด๊ฐ€ ๋งŽ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ํ•œ ๋ฒˆ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฝ‘์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

df["Year"].unique()
array(['2006', '1985', '2008', '2009', '1996', '1989', '1984', '2005',
       '1999', '2007', '2010', '2013', '2004', '1990', '1988', '2002',
       '2001', '2011', '1998', '2015', '2012', '2014', '1992', '1997',
       '1993', '1994', '1982', '2003', '1986', '2000', nan, '1995',
       '2016', '1991', '1981', '1987', '1980', '1983', '2020',
       'Adventure', '2017'], dtype=object)
df[df["Year"]=="Adventure"]
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
11593 11595 Boku no Natsuyasumi 3: Hokkoku Hen: Chiisana B... 2007 Adventure Sony Computer Entertainment NaN NaN 0.08 NaN 0.08 NaN
13538 13540 B's-LOG Party??PSP 2010 Adventure Idea Factory NaN NaN 0.04 NaN 0.04 NaN

Platform์—๋Š” ์—ฐ๋„๊ฐ€, Year์—๋Š” ์žฅ๋ฅด๊ฐ€, ์žฅ๋ฅด์—๋Š” ํ”Œ๋žซํผ์ด ์ ํ˜€์žˆ์Šต๋‹ˆ๋‹ค.

์ „ํ˜•์ ์ธ ํœด๋จผ์—๋Ÿฌ๋กœ๊ตฐ์š”. ์ž์—ฐ์ƒ์˜ ๋ฐ์ดํ„ฐ๋„ ์•„๋‹Œ ์ •์ œ๋œ ๋ฐ์ดํ„ฐ์—๋„ ์˜ค๋ฅ˜(๋…ธ์ด์ฆˆ)๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

๋‹คํ–‰ํžˆ ๊ทธ ์ด์™ธ์—” ํŠน๋ณ„ํžˆ ์ด์ƒ์น˜๊ฐ€ ๋ˆˆ์— ๋ณด์ด์ง„ ์•Š์Šต๋‹ˆ๋‹ค.

์ด์ƒ์น˜๋Š” ๊ฐ„๋‹จํ•˜๊ฒŒ dropํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ์ง€์›Œ๋ฒ„๋ฆด ์ˆ˜๋„ ์žˆ์ง€๋งŒ ์ด๋ฒˆ์—๋Š” ๋น„๊ต์  ๊ฐ„๋‹จํ•œ ์—๋Ÿฌ์˜€๊ธฐ ๋•Œ๋ฌธ์— ์ง์ ‘ ์ˆ˜์ •ํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

# ์ด ์ฝ”๋“œ๋Š” ๋ฐ”๋กœ ์ ์šฉํ•ด์„œ ๋ฐ”๊พธ๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๋Ÿฌ๋ฒˆ ์‹คํ–‰ํ•˜๋ฉด ๊ณ„์† ๋ฐ”๋€Œ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ ์šฉํ›„์—๋Š” ์ฃผ์„์ฒ˜๋ฆฌ ํ•ด์ฃผ์„ธ์š”

# df.iloc[11593, [2, 3, 4]] = list(df.iloc[11593, [4, 2, 3]])
# df.iloc[13538, [2, 3, 4]] = list(df.iloc[13538, [4, 2, 3]])
df.iloc[11593]
Rank                                                        11595
Name            Boku no Natsuyasumi 3: Hokkoku Hen: Chiisana B...
Platform                              Sony Computer Entertainment
Year                                                         2007
Genre                                                   Adventure
Publisher                                                     NaN
NA_Sales                                                      NaN
EU_Sales                                                     0.08
JP_Sales                                                      NaN
Other_Sales                                                  0.08
Global_Sales                                                  NaN
Name: 11593, dtype: object

๊ทธ๋Ÿผ ๊ณ„์†ํ•ด์„œ nan, ๊ฒฐ์ธก์น˜์— ๋Œ€ํ•ด์„œ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

df[df[["Year"]].isnull().any(axis=1)].head()
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
179 180 Madden NFL 2004 PS2 NaN Sports Electronic Arts 4.26 0.26 0.01 0.71 5.23
377 378 FIFA Soccer 2004 PS2 NaN Sports Electronic Arts 0.59 2.36 0.04 0.51 3.49
431 432 LEGO Batman: The Videogame Wii NaN Action Warner Bros. Interactive Entertainment 1.86 1.02 NaN 0.29 3.17
470 471 wwe Smackdown vs. Raw 2006 PS2 NaN Fighting NaN 1.57 1.02 NaN 0.41 3.00
607 608 Space Invaders 2600 NaN Shooter Atari 2.36 0.14 NaN 0.03 2.53

์œ„ํ‚ค ๋ฐฑ๊ณผ์— ๋”ฐ๋ฅด๋ฉด ๊ฐ€์žฅ ๋จผ์ € ๋‚˜์˜ค๋Š” Madden NFL 2004๋Š” 2003๋…„ Game Boy Advance, GameCube, Microsoft Windows, PlayStation, PlayStation 2, Xbox๋ฅผ ํ†ตํ•ด ๋ฐœ๋งค๋˜์—ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

https://en.wikipedia.org/wiki/Madden_NFL_2004

df.iloc[179]
Rank                        180
Name            Madden NFL 2004
Platform                    PS2
Year                        NaN
Genre                    Sports
Publisher       Electronic Arts
NA_Sales                   4.26
EU_Sales                   0.26
JP_Sales                   0.01
Other_Sales                0.71
Global_Sales               5.23
Name: 179, dtype: object
df.iloc[179, 3] = 2003
df.iloc[179]
Rank                        180
Name            Madden NFL 2004
Platform                    PS2
Year                       2003
Genre                    Sports
Publisher       Electronic Arts
NA_Sales                   4.26
EU_Sales                   0.26
JP_Sales                   0.01
Other_Sales                0.71
Global_Sales               5.23
Name: 179, dtype: object

์กฐ๋ณ„ํ•™์Šต 2

์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š”๋Œ€๋กœ ๋ฐ”๋€Œ์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์ƒ๊ฐํ•ด๋ณด๋ฉด ์•ผ๊ตฌ๋‚˜ ์ถ•๊ตฌ ๊ฒŒ์ž„ ์ œ๋ชฉ์— ์—ฐ๋„๊ฐ€ ๋ถ™๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์€ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋งˆ์นจ ๊ทธ ์•„๋ž˜ ํ•ญ๋ชฉ์„ ๋ณด๋‹ˆ FIFA Soccer 2004์˜ ๊ฒฝ์šฐ๋„ 2003๋…„ ๋ฐœ๋งค๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์Šคํฌ์ธ  ๊ฒŒ์ž„ ํƒ€์ดํ‹€์— ์—ฐ๋„๊ฐ€ ๋ถ™์€ ๊ฒฝ์šฐ ํ•ด๋‹น ์—ฐ๋„ -1์„ ์ ์šฉํ•˜๋ฉด ํ•˜๋‚˜ํ•˜๋‚˜ ์ฐพ์•„๋ณด์ง€ ์•Š๊ณ  ๊ฒฐ์ธก์น˜๋ฅผ ์ฑ„์›Œ๋„ฃ์„ ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

โ—(์กฐ๋ณ„ํ•™์Šต)์–ด๋–ป๊ฒŒ ํ•˜๋ฉด ํ•ด๋‹น ๊ฒฐ์ธก์น˜๋ฅผ ์ฑ„์›Œ๋„ฃ์„ ์ˆ˜ ์žˆ์„๊นŒ์š”?

์›๋ณธ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ์ ์šฉํ•˜๊ธฐ ์ „์— "Year" ์ปฌ๋Ÿผ์ด NaN์ธ ํ–‰๋งŒ ๋ฝ‘์€ df_temp๋ฅผ ๋งŒ๋“ค์–ด ์‹ค์Šตํ•ด๋ด…์‹œ๋‹ค.

Hint. apply ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•ด์„œ ํ•œ ๋ฒˆ์— ์ฒ˜๋ฆฌ ํ•  ์ˆ˜๋„ ์žˆ์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

df_temp = df[df[["Year"]].isnull().any(axis=1)][df["Genre"] == "Sports"]
df_temp = df[df["Genre"] == "Sports"].iloc[:50]
# Pandas ์‚ฌ์šฉ์ด ๋‚ฏ์„ค๋‹ค๋ฉด ์ข€ ๋” ์ž‘์€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ๋‹ค๋ค„๋ด๋„ ์ข‹์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์•„๋ž˜ ์ฝ”๋“œ๋Š” Sports ์žฅ๋ฅด๋งŒ์„ ๊ณจ๋ผ๋‚ด๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

# ๊ฒฐ์ธก์น˜๋งŒ ๋ชจ์•„์„œ ๋ณด๊ธฐ
# df_temp = df[df[["Year"]].isnull().any(axis=1)][df["Genre"] == "Sports"]

# 50๊ฐœ๋งŒ ์‚ฌ์šฉํ•ด๋ณด๊ธฐ
# df_temp = df[df["Genre"] == "Sports"].iloc[:50]

df_temp = df[df["Genre"] == "Sports"]
df_temp.head()

Hint(๋ฐ˜๋“œ์‹œ ํ† ์˜ ํ›„์— ์—ด์–ด๋ณด์„ธ์š”)

# Hint

def year_function(row):
    if row[4] == "Sports":
        text = row[1].split()[-1]
        if text.isdigit() and (1960 <= int(text) <= 2022):
            return str(int(text) - 1)
        else:
            return row[3]
    else:
        return row[3]

df_temp["Year"] = df_temp.apply(year_function, axis = 1)
df_temp.head()
df_temp.info()

์ œ ์ฝ”๋“œ๋Œ€๋กœ๋ผ๋ฉด 10๊ฐœ์˜ "Year"๊ฐ’์ด ์ถ”๊ฐ€๋ ๊ฒ๋‹ˆ๋‹ค!

์กฐ๋ณ„ํ•™์Šต 3

๋‹ค๋“ค ์ž˜ ํ•ด๋‚ด์…จ์œผ๋ฆฌ๋ผ ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค!

์ € ๊ฐ™์€ ๊ฒฝ์šฐ Triple Play 99์ฒ˜๋Ÿผ ๋งจ ๋’ค ๋‘๊ธ€์ž๋งŒ ํ‘œ๊ธฐํ•˜๋Š” ๋ถ€๋ถ„์„ ๋น ๋œจ๋ ธ๊ตฐ์š”!

๊ทธ ์™ธ์—๋„ 20-03 ์ด๋ผ๋˜๊ฐ€ 2K8 ๊ฐ™์ด ์ƒ๊ฐ์ง€๋„ ๋ชปํ•œ ํ‘œ๊ธฐ๋“ค์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

๋‹ค๋“ค ์ €๋ณด๋‹ค ๋” ๋งŽ์€ ๊ฒฝ์šฐ์— ๋Œ€ํ•ด์„œ ๊ณ ๋ฏผํ•ด๋ณด์…จ์œผ๋ฆฌ๋ผ ๊ธฐ๋Œ€ํ•ฉ๋‹ˆ๋‹ค.

๋งŒ์ผ ์—ฐ๋„๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ๋Š” ์–ด๋–ป๊ฒŒ ํ• ๊นŒ์š”?

์‹ค์€ ์ œ๋ชฉ์„ ํ†ตํ•ด์„œ ์ฐพ์•„๋‚ด๋Š” ๊ฒƒ ๋ณด๋‹ค ๋” ํ™•์‹คํ•œ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋ˆˆ์น˜์ฑ„์‹  ๋ถ„๋“ค๋„ ๊ณ„์‹œ๊ฒ ์ง€๋งŒ ์ด ๋ฐ์ดํ„ฐ๋Š” ๊ฐ™์€ ๊ฒŒ์ž„์ด๋ผ๋„ ๋ฐœ๋งค๋œ ํ”Œ๋žซํผ์— ๋”ฐ๋ผ์„œ ํ†ต๊ณ„๊ฐ€ ๋”ฐ๋กœ ์žกํ˜€์žˆ์Šต๋‹ˆ๋‹ค. ์šด์ด ์ข‹๋‹ค๋ฉด ๋‹ค๋ฅธ ํ”Œ๋žซํผ์—์„œ ๋ฐœ๋งค๋œ ๊ฐ™์€ ๊ฒŒ์ž„์˜ ํ†ต๊ณ„์ž๋ฃŒ์—์„œ ์œ ์˜๋ฏธํ•œ ์ž๋ฃŒ๋ฅผ ์ฐพ์•„๋‚ผ์ง€๋„ ๋ชจ๋ฆ…๋‹ˆ๋‹ค.

โ—(์กฐ๋ณ„ํ•™์Šต)"Year" ์ปฌ๋Ÿผ์ด ๋นˆ ๋ฐ์ดํ„ฐ ์ค‘ ๋‹ค๋ฅธ ํ”Œ๋žซํผ์—์„œ ๋ฐœ๋งค๋œ ๋ฐ์ดํ„ฐ์— "Year"๊ฐ’์ด ์žˆ์„ ๊ฒฝ์šฐ ์ด ๊ฐ’์œผ๋กœ ๊ฒฐ์ธก์น˜๋ฅผ ์ฑ„์›Œ๋ด…์‹œ๋‹ค.

Hint(๋ฐ˜๋“œ์‹œ ํ† ์˜ ํ›„์— ์—ด์–ด๋ณด์„ธ์š”)

df_sorted = df.sort_values(by = [df.columns[1], df.columns[3]], ascending = False, na_position = "first")
df_sorted.head()

์œ„ ๋ชฉ๋ก์—์„  ๋ณด์ด์ง€ ์•Š์ง€๋งŒ na_position ์ธ์ž๋ฅผ first(default๋Š” last)๋กœ ์ฃผ๊ฒŒ ๋˜๋ฉด Nan ๊ฐ’์„ ์ œ์ผ ์œ„๋กœ ์˜ฌ๋ ค์ค๋‹ˆ๋‹ค. ์ €๋Š” ์ด ์„ฑ์งˆ์„ ์ด์šฉํ•ด Null ๊ฐ’์„ ์ฐพ๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

df_sorted[df_sorted["Name"] == "Madden NFL 07"]
def general_year_function(row):
    if str(row[3]) == "nan":
        text = str(df_sorted[df_sorted["Name"] == row[1]].iloc[-1][3])
        if text.isdigit():
            return df_sorted[df_sorted["Name"] == row[1]].iloc[-1][3]
    else:
        return row[3]
df_sorted["Year"] = df_sorted.apply(general_year_function, axis = 1)
df_sorted[df_sorted["Name"] == "Madden NFL 07"]
df_sorted.info()

๋‚ด์นœ๊น€์— ์•„๊นŒ ๋งŒ๋“ค์–ด๋‘” year_function๋„ ์ ์šฉํ•ด๋ด…์‹œ๋‹ค.

๋‹จ, ์ด๋ฏธ ์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ฒ˜์Œ์— ํ–ˆ๋˜ ๊ฒƒ๊ณผ ๋‹ฌ๋ฆฌ ํฐ ํšจ๊ณผ๋Š” ์—†์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

df_sorted["Year"] = df_sorted.apply(year_function, axis = 1)
df_sorted.info()

์—ฌ๋Ÿฌ๋ถ„์˜ ๊ฒฐ๊ณผ๋ฌผ์€ ์–ด๋– ์‹ ๊ฐ€์š”? ์งง์€ ์‹œ๊ฐ„์— ๋งŒ๋“  ๊ฒƒ ์น˜๊ณ ๋Š” ์ œ๋ฒ• ์ž˜ ์ฑ„์›Œ์ง„ ๊ฒƒ ๊ฐ™๋„ค์š”.

์ €๋Š” ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด ์›๋ณธ ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•ด ๋‘๊ฒ ์Šต๋‹ˆ๋‹ค.

df = df_sorted.sort_index(ascending = False)
df.head()

์กฐ๋ณ„ ํ•™์Šต 4

๋‚จ์€ "Year" ์ปฌ๋Ÿผ์€ ์–ด๋–ป๊ฒŒ ์ฑ„์šธ ์ˆ˜ ์žˆ์„๊นŒ์š”?

์ด์ œ ๋‚จ์€ ์ˆ˜๊ฐ€ ๋งŽ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์ผ์ผํžˆ ๊ฒ€์ƒ‰ํ•  ์ˆ˜๋„ ์žˆ๊ฒ ์ง€๋งŒ 10๋งŒ๊ฐœ, 100๋งŒ๊ฐœ์งœ๋ฆฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃฌ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๊ณ  ๋ฐฉ๋ฒ•์„ ๋…ผ์˜ํ•ด ๋ด…์‹œ๋‹ค.

Unit 3. ํŒ๋งค๋Ÿ‰์„ ์ฑ„์›Œ๋ณด์ž!

์‚ฌ์‹ค ์–ด๋–ป๊ฒŒ ๋ณด๋ฉด ์—ฐ๋„์— ์žˆ๋˜ ๊ฒฐ์ธก์น˜๋Š” ํฐ ๋ฌธ์ œ๊ฐ€ ์•„๋‹ˆ์—ˆ์„์ง€๋„ ๋ชจ๋ฆ…๋‹ˆ๋‹ค. ์ˆ˜๋„ ์ ๊ณ  ๋ฒ”์œ„ ์—ญ์‹œ ์ข์•˜๊ธฐ ๋•Œ๋ฌธ์ด์ฃ .

์ด๋ฒˆ ์œ ๋‹›์—์„œ๋Š” ํŒ๋งค๋Ÿ‰์— ์žˆ๋Š” ์ˆ˜๋งŽ์€ ๊ฒฐ์ธก์น˜๋ฅผ ์ฑ„์›Œ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

๊ทธ ์ค‘์—์„œ๋„ ๊ฐ€์žฅ ์ผ๋ฐ˜์ ์ด๋ฉด์„œ๋„ ๋งŽ์ด ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์— ๋Œ€ํ•ด์„œ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

๋ฐฉ๋ฒ• 1. pd.drop()

๋‹จ์ˆœํžˆ ํ–‰ ํ˜น์€ ์—ด์„ ์ œ๊ฑฐํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด์„œ JP_sales์˜ ๊ฒฝ์šฐ ์ ˆ๋ฐ˜ ์ด์ƒ์ด ๊ฒฐ์ธก์น˜(๋ณธ๋ž˜๋Š” 0์œผ๋กœ ๊ธฐ๋ก)์œผ๋กœ ๋น„๋””์˜ค ๊ฒŒ์ž„์ด ์ผ๋ณธ์—์„œ ๊ฐ–๋Š” ์ธ๊ธฐ๋ฅผ ์ƒ๊ฐํ•ด๋ณผ๋•Œ 6000์—ฌ๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋Š” ์กฐ๊ธˆ ์ด์ƒํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฐ ๊ฒฝ์šฐ JP_sales ๋ฐ์ดํ„ฐ ์ž์ฒด์˜ ์‹ ๋ขฐ๋„๊ฐ€ ๋‚ฎ๊ธฐ ๋•Œ๋ฌธ์— ์•„์˜ˆ ๋ฐฐ์ œํ•˜๋Š” ๊ฒƒ์„ ๊ณ ๋ คํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

df.info()

์ด๋ ‡๊ฒŒ ๋ฐ์ดํ„ฐ์— ๊ฒฐ์ธก์น˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์€ ๊ฒฝ์šฐ JP_Sales ์ปฌ๋Ÿผ ์ž์ฒด๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ๋„ ๊ณ ๋ คํ•ด๋ณผ๋งŒ ํ•  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

# inplace = False(default๊ฐ’)์ผ ๊ฒฝ์šฐ ์›๋ณธ์— ์ ์šฉ๋˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ฃผ์„๊ณผ ๊ฐ™์ด ์ž‘์„ฑํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

# df = df.drop(columns = "JP_Sales", axis = 1)

df.drop(columns = "JP_Sales", axis = 1)

์ด ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋‹ˆ Global_Sales์˜ ๊ฐ’์„ ์ˆ˜์ •ํ•  ํ•„์š”๊ฐ€ ์žˆ์–ด๋ณด์ž…๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ ๊ทธ ์ „์— 16595๋ฒˆ์ด ์ œ ๋ˆˆ์— ๋„๋Š”๊ตฐ์š”.(Global_Sales์˜ ๊ฐ’์„ ์ˆ˜์ •ํ•˜๋Š” ๊ฒƒ์€ ์ˆ™์ œ๋กœ ๋‚จ๊ฒจ๋‘๊ฒ ์Šต๋‹ˆ๋‹ค.)

NA, EU, ๊ธฐํƒ€ ์ง€์—ญ์—์„œ ํŒ๋งค๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ์—†๋Š”๋ฐ global๋งŒ? ์•„๋งˆ๋„ ์ผ๋ณธ์—์„œ๋งŒ ํŒ”๋ ธ๋˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์‚ญ์ œ๋˜๋ฉด์„œ ๋ฌธ์ œ๊ฐ€ ๋œ ๋ชจ์–‘์ž…๋‹ˆ๋‹ค.

global ํŒ๋งค๋Ÿ‰์ด ๋„ˆ๋ฌด ๋‚ฎ์€ ๊ฒฝ์šฐ ๊ฒฐ์ธก์น˜๋„ ๋„ˆ๋ฌด ๋งŽ๊ณ  ๋ฐ์ดํ„ฐ์— ์˜๋ฏธ๋„ ์ ์„ ๊ฒƒ ๊ฐ™๋‹ค๋Š” ์ƒ๊ฐ์ด ๋“œ๋Š”๊ตฐ์š”. ์–ด๋Š์ •๋„ ํŒ๋งค๊ฐ€ ๋œ ๊ฒŒ์ž„๋งŒ์„ ๊ณจ๋ผ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

๋‹ค๋งŒ ์ด ๊ฒฝ์šฐ๋Š” ๊ตณ์ด drop์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•„๋„ ๊ดœ์ฐฎ์Šต๋‹ˆ๋‹ค๋งŒ ์กฐ๊ธˆ ๋ณต์žกํ•œ ๊ฒฝ์šฐ ์—ฌ๋Ÿฌ๋ฒˆ์˜ drop์„ ํ†ตํ•ด ์›ํ•˜๋Š” ๊ฒฐ๊ณผ๋งŒ์„ ๋‚จ๊ธฐ๋Š” ๋ฐฉ๋ฒ•๋„ ์ข‹์Šต๋‹ˆ๋‹ค.

df[df["Global_Sales"] >= 1]

๋ฐฉ๋ฒ• 2. df.dropna()

dropna๋Š” ๊ฒฐ์ธก์น˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ–‰์ด๋‚˜ ์—ด์„ ์‚ญ์ œํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
๊ฒฐ์ธก์น˜๊ฐ€ ํฌํ•จ๋˜์–ด์žˆ์œผ๋ฉด ์‚ญ์ œํ•  ์ˆ˜๋„ ์žˆ๊ณ  ํ˜น์€ ์กฐ๊ฑด๋“ฑ์„ ๋‹ค๋Š” ๋ฐฉ๋ฒ•๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

dropna์˜ ๊ฒฝ์šฐ ๋งŽ์ด ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋Œ€ํ•ด์„œ ๊ฐ„๋‹จํ•˜๊ฒŒ ์งš๊ณ  ๋„˜์–ด๊ฐ€๊ฒ ์Šต๋‹ˆ๋‹ค.

axis(default ๊ฐ’์€ 0์ž…๋‹ˆ๋‹ค)

0, or 'index' : ๊ฒฐ์ธก๊ฐ’์ด ์žˆ๋Š” ํ–‰(๊ฐ€๋กœ)์„ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

1, or 'columns' : ๊ฒฐ์ธก๊ฐ’์ด ์žˆ๋Š” ์—ด(์„ธ๋กœ)์„ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

how(default ๊ฐ’์€ 'any'์ž…๋‹ˆ๋‹ค)

'any' : ๊ฒฐ์ธก์น˜๊ฐ€ ํ•˜๋‚˜๋ผ๋„ ์žˆ์œผ๋ฉด ํ•ด๋‹น ์—ด/ํ–‰์„ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

'all' : ๋ชจ๋“  ํ•ญ๋ชฉ์ด ๊ฒฐ์ธก์น˜์ผ๋•Œ ํ•ด๋‹น ์—ด/ํ–‰์„ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

threshint

int : ํ–‰/์—ด ์ค‘ value๊ฐ€ ์žˆ๋Š” ์นธ์˜ ๊ฐฏ์ˆ˜๊ฐ€ int๊ฐœ ๋ฏธ๋งŒ์ธ ํ–‰/์—ด์„ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

subset

colomns or lows : ์„ ํƒํ•œ ํ–‰/์—ด๋“ค์— ํ•œํ•ด์„œ ์œ„ ์กฐ๊ฑด๋“ค์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

# default ๊ฐ’์ธ axis = 0, how = 'any'

df.dropna()
df.dropna().info()

๊ฒฐ์ธก์น˜๊ฐ€ ํ•˜๋‚˜๋„ ์—†๋Š” ๋ฐ์ดํ„ฐ ์…‹์„ ๋งŒ๋“ค์–ด๋ƒˆ์Šต๋‹ˆ๋‹ค๋งŒ ๋„ˆ๋ฌด ๋งŽ์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์ž˜๋ฆฐ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ด๋ฒˆ์—” ์ ๋‹นํ•œ ์กฐ๊ฑด์„ ๋‹ฌ์•„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

df.dropna(thresh = 3, subset = ["NA_Sales", "EU_Sales", "JP_Sales", "Other_Sales"])

์ด ๊ฒฝ์šฐ ์ž‘์€ ๋‹จ์œ„์—์„œ ์ˆซ์ž๊ฐ€ ์•ˆ ๋งž๋Š” ๊ฒƒ์ด ๊ฑฐ์Šฌ๋ฆฌ๋Š”๊ตฐ์š”.

์ด์ „์— ํ–ˆ๋˜ ๋ฐฉ๋ฒ•์„ ์‘์šฉํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

df.dropna(thresh = 3, subset = ["NA_Sales", "EU_Sales", "JP_Sales", "Other_Sales"])[df["Global_Sales"] >= 1]

๋ฐฉ๋ฒ• 3. df.fillna()

๊ฒฐ์ธก์น˜๋ฅผ ๋ฒ„๋ฆฌ์ž๋‹ˆ ๋ฐ์ดํ„ฐ๊ฐ€ ์กฐ๊ธˆ๋ฐ–์— ๋‚จ์ง€ ์•Š๊ณ  ๋‚จ๊ฒจ๋‘์ž๋‹ˆ ๋ˆˆ์— ๊ฐ€์‹œ๋กœ๊ตฐ์š”.

๊ทธ๋ ‡๋‹ค๋ฉด ์ฑ„์šฐ๋Š” ๊ฒƒ์€ ์–ด๋–จ๊นŒ์š”?

fillna๋Š” ๊ฒฐ์ธก์น˜๋ฅผ ์ฑ„์šฐ๋Š” ๋ฐฉ๋ฒ• ์ค‘ ๊ฐ€์žฅ ์‰ฌ์šด ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค.

์ œ ์ƒ๊ฐ์—” ํผ๋ธ”๋ฆฌ์…”๋Š” ์ข…๋ฅ˜๋„ ๋„ˆ๋ฌด ๋งŽ์€๋ฐ๋‹ค ๋”ฑํžˆ ์ค‘์š”ํ•  ๊ฒƒ ๊ฐ™์ง€ ์•Š๋„ค์š”. ํผ๋ธ”๋ฆฌ์…”์— ์žˆ๋Š” ๊ฒฐ์ธก์น˜๋Š” Unknown์œผ๋กœ ์ฑ„์šฐ๊ฒ ์Šต๋‹ˆ๋‹ค.

๋˜ ํŒ๋งค๋Ÿ‰ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๋‹ˆ ์ฃผ๋กœ ์ ๊ฒŒ ํŒ”๋ฆฐ ๊ฒฝ์šฐ์— ๊ฒฐ์ธก์น˜๊ฐ€ ๋งŽ๋‹ค๋Š” ๊ฒƒ์€ ํ˜น์‹œ ์•„์˜ˆ ์•ˆ ํŒ”๋ฆฐ ๊ฒƒ์€ ์•„๋‹๊นŒ์š”? ํŒ๋งค๋Ÿ‰์˜ ๊ฒฐ์ธก์น˜๋Š” 0์œผ๋กœ ์ฃผ๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

value = {"Publisher" : "Unknown", "NA_Sales" : 0, "EU_Sales" : 0, "JP_Sales" : 0, "Other_Sales" : 0}
df.fillna(value = value)

์กฐ๋ณ„ํ•™์Šต 5

์ด์ฒ˜๋Ÿผ ๊ฒฐ์ธก์น˜์— ๋ฌธ์ž๋‚˜ ์ˆซ์ž๋ฅผ ์ •ํ•ด์„œ ์ง‘์–ด๋„ฃ์–ด์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹น์—ฐํžˆ ๊ทธ ์ˆซ์ž๋Š” ๋ฐ์ดํ„ฐ์˜ ํ‰๊ท ๊ฐ’๊ณผ ๊ฐ™์€ ํ†ต๊ณ„๊ฐ’์—์„œ ๊ฐ€์ ธ์˜ฌ ์ˆ˜๋„ ์žˆ๊ฒ ์ฃ .

ํ˜น์€ method ํŒŒ๋ผ๋ฏธํ„ฐ์— 'backfill'์ด๋‚˜ 'ffill' ๋“ฑ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ฃผ๋ฉด ๋‹ค์Œ์— ๋‚˜์˜ค๋Š” ์ฒซ๋ฒˆ์งธ ์œ ํšจ๊ฐ’์œผ๋กœ ์ฑ„์šฐ๊ฑฐ๋‚˜ ์ด์ „์— ๋‚˜์˜จ ๋งˆ์ง€๋ง‰ ์œ ํšจ๊ฐ’์œผ๋กœ ์ฑ„์šธ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค๋งŒ ์ด ๋ฐฉ๋ฒ•์€ ์ด ๋ฐ์ดํ„ฐ ์…‹์—์„  ํฌ๊ฒŒ ์œ ์šฉํ•ด๋ณด์ด์ง€ ์•Š๋Š”๋ฐ ์—ฌ๋Ÿฌ๋ถ„๋“ค์€ ์–ด๋–ป๊ฒŒ ์ƒ๊ฐํ•˜์‹œ๋‚˜์š”? ์ž ์‹œ ์ƒ๊ฐํ•ด๋ด…์‹œ๋‹ค.

โ—(์ƒ๊ฐํ•ด๋ณด๊ธฐ) ์—ฌ๊ธฐ์—๋„ ์จ๋ณผ๋งŒ ํ• ๊นŒ์š”? ๊ทธ ์ด์œ ๋Š” ๋ฌด์—‡์ธ๊ฐ€์š”?
ํ˜น์€ ๋ณ„๋กœ์ผ ๊ฒƒ ๊ฐ™๋‚˜์š”? ๊ทธ๋ ‡๋‹ค๋ฉด ์–ด๋–ค ๋ฐ์ดํ„ฐ์—์„œ ์“ธ๋งŒํ• ๊นŒ์š”?

profile
๋งˆ์ผ€ํŒ…์„ ์œ„ํ•œ ์ธ๊ณต์ง€๋Šฅ ์„ค๊ณ„์™€ ์Šคํƒ€ํŠธ์—… Log

0๊ฐœ์˜ ๋Œ“๊ธ€