[TIL#22 250321]

๊ฐ•๋ฏผ์ง€ยท2025๋…„ 3์›” 21์ผ

๋ฐ์ดํ„ฐ๋ถ„์„_TIL

๋ชฉ๋ก ๋ณด๊ธฐ
24/81
post-thumbnail

ํ”Œ์ ํ•˜๋ฉด์„œ TIL๊นŒ์ง€ ์“ฐ๋Š” ๊ฑฐ ๋Œ€์ฒด ์–ด์ผ€ํ•จ???
WIL๋กœ ๋ฏธ๋ฃจ๊ฒ ์Šต๋‹ˆ๋‹ค...^^;;

ํ”„๋กœ์ ํŠธ ์ฝ”๋“œ ์ž‘์„ฑํ•œ๊ฑฐ ๋ƒ…๋‹ค ๋ณต๋ถ™ํ•˜๊ธฐ


๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

https://teamsparta.notion.site/NBA-1bc2dc3ef5148120aeecfa26008ccba1

import pandas as pd

games = pd.read_csv('data/games.csv')
games_details = pd.read_csv('data/games_details.csv')
players = pd.read_csv('data/players.csv')
ranking = pd.read_csv('data/ranking.csv')
teams = pd.read_csv('data/teams.csv')

games.isna().sum()

games.shape

games[games['PTS_home'].isna()==True]

games_details.isna().sum()

ranking.isna().sum()

games.head()

games.columns

2014 ์‹œ์ฆŒ ์ดํ›„ ๋ฐ์ดํ„ฐ๋งŒ ์ถ”์ถœ

games = games[games['SEASON']>=2014]

ํ•„์š”์—†๋Š” ์ปฌ๋Ÿผ ์‚ญ์ œ (์ค‘๋ณต ์ปฌ๋Ÿผ)

games = games.drop(['HOME_TEAM_ID','VISITOR_TEAM_ID'], axis=1)

games.shape

games.isna().sum()

ranking.head()

ranking.shape

games_details.head()

games_details.shape

NBA ๋ถ„์„ ์˜ˆ์‹œ
https://western-sky.tistory.com/42

์‹œ์ฆŒ๋ณ„ ์ƒ์œ„ 5ํŒ€ ์ถ”์ถœ (์Šน๋ฅ ์ด ๋™๋ฅ ์ธ ๊ฒฝ์šฐ ๊ทธ๋ƒฅ ๋žœ๋ค์œผ๋กœ ํ•˜๋‚˜๋งŒ)
games ํ…Œ์ด๋ธ”์—์„œ ๊ฐ ํŒ€์˜ ํ‰๊ท ์น˜ ๊ณ„์‚ฐ

์‹œ์ฆŒ๋ณ„ ํ•˜์œ„ 5ํŒ€ ์ถ”์ถœ
games ํ…Œ์ด๋ธ”์—์„œ ๊ฐ ํŒ€์˜ ํ‰๊ท ์น˜ ๊ณ„์‚ฐ

์šฐ๋ฆฌํŒ€(Oklahoma city)์˜ ๊ธฐ๋Ÿ‰ ๊ณ„์‚ฐ

๋ถ€์กฑํ•œ ์—ญ๋Ÿ‰ ๋ณด์™„์„ ์œ„ํ•œ ํ›„๋ณด ์˜์ž… ์ „๋žต

์‹œ์ฆŒ๋ณ„ ์ƒ์œ„ 5ํŒ€ ์ถ”์ถœ

ranking = pd.read_csv('data/ranking.csv')

2014~2022 ์‹œ์ฆŒ ๋ฐ์ดํ„ฐ ํ•„ํ„ฐ๋ง

rank_post2014 = ranking[(ranking['SEASON_ID']>=22014)&(ranking['SEASON_ID']<22022)]

์‹œ์ฆŒ ๋ณ„ ๊ฒฝ๊ธฐ ์ˆ˜

rank_post2014.groupby('SEASON_ID')['G'].max()

์‹œ์ฆŒ๋ณ„ ๋งˆ์ง€๋ง‰ ๋‚ ์งœ

end_date = rank_post2014.groupby('SEASON_ID')['STANDINGSDATE'].max().to_list()
end_date

rank_last_date = rank_post2014[rank_post2014['STANDINGSDATE'].isin(end_date)]

rank_last_date['group rank'] = rank_last_date.groupby('SEASON_ID')['W_PCT'].rank(method='first', ascending=False)

์‹œ์ฆŒ 2014~2021 top5 ํŒ€ (์‹œ์ฆŒ๋ณ„, ์ˆœ์œ„๋ณ„ ์ •๋ฆฌ)

rank_top5 = rank_last_date[rank_last_date['group rank']<=5].sort_values(['SEASON_ID','group rank'],ascending=[False,True])

ํ•„์š”ํ•œ ์ปฌ๋Ÿผ๋งŒ ์ถ”์ถœ

rank_top5 = rank_top5[['TEAM_ID','SEASON_ID','CONFERENCE','TEAM','HOME_RECORD','ROAD_RECORD','group rank']]

rank_top5

games ํ…Œ์ด๋ธ”์—์„œ ๊ฐ ํŒ€์˜ ํ‰๊ท ์น˜ ๊ณ„์‚ฐ

games = pd.read_csv('data/games.csv')
games.head()

ํ•„์š”์—†๋Š” ์ปฌ๋Ÿผ ์‚ญ์ œ

games = games.drop(['HOME_TEAM_ID','VISITOR_TEAM_ID','GAME_STATUS_TEXT'],axis=1)

2014์‹œ์ฆŒ ์ดํ›„์˜ ๋ฐ์ดํ„ฐ๋งŒ ํ•„ํ„ฐ๋ง

games = games[games['SEASON']>=2014]

games.isna().sum()

games ์›๋ณธ ๋ฐ์ดํ„ฐ์— ์žˆ๋˜ ๊ฒฐ์ธก์น˜๋Š” ๋ชจ๋‘ 2003๋…„ 10์›”์˜ ์ผ๋ถ€ ๋ฐ์ดํ„ฐ์— ํฌํ•จ๋จ

ํ•„ํ„ฐ๋ง ์ดํ›„ ๋ฐ์ดํ„ฐ์—๋Š” ์ƒ๊ด€ X (์‹ ๊ฒฝ ์•ˆ์จ๋„ ๋จ)

2014 ์‹œ์ฆŒ ๋ฐ์ดํ„ฐ๋งŒ ์ด์šฉํ•˜์—ฌ ํŒ€๋ณ„ ํ‰๊ท ์น˜ ๊ณ„์‚ฐ

2014์‹œ์ฆŒ games ๋ฐ์ดํ„ฐ๋งŒ ์ถ”์ถœ

games_2014 = games[games['SEASON']==2014]
games.shape

games_2014.head()

rank_top5_2014 = rank_top5[rank_top5['SEASON_ID']==22014]
rank_top5_2014

merged_games2014_home = pd.merge(games_2014,rank_top5_2014,how='inner',left_on='TEAM_ID_home', right_on='TEAM_ID')

merged_games2014_home.describe()

merged_games2014_away = pd.merge(games_2014,rank_top5_2014,how='inner',left_on='TEAM_ID_away', right_on='TEAM_ID')

merged_games2014_away.describe()

merged_games2014_home.groupby("TEAM")['FG3_PCT_home'].mean()

merged_games2014_away.groupby("TEAM")['FG3_PCT_away'].mean()

games_details

games_details[(games_details['FGM'].isna()==True)&(games_details['TEAM_CITY'].isin(rank_top5_2014['TEAM']))].head(2000)

games[games['GAME_ID']==41900222]

games_details.groupby('GAME_ID')['GAME_ID'].count()

games_details ๊ฒฐ์ธก์น˜ -> ๊ฐ ๊ฒŒ์ž„ ๋‹น ์ฝ”์น˜ ์ •๋ณด

games_details MIN๊ฐ’(๋›ด ์‹œ๊ฐ„)์ด ๋งˆ์ด๋„ˆ์Šค์ธ ์„ ์ˆ˜๋“ค ์กด์žฌ (์•ฝ 20row??) -> ์ด์ƒ์น˜??(์ด์ƒ์น˜ ใ…‡ใ…‡)

๊ฒฝ๊ธฐ ๋›ด ์‹œ๊ฐ„์ด 0์ด์ƒ์ธ ์„ ์ˆ˜๋“ค๋งŒ ์ถ”์ถœํ•ด์•ผํ• ๋“ฏ

์ „์ฒ˜๋ฆฌ ๋นจ๋ฆฌ ํ•˜๊ณ  ํ…Œ์ด๋ธ” ํ•ฉ์ณ์„œ ๊ณ„์‚ฐํ•ด๋ณด๊ณ  2์ฐจ ์Šคํƒฏ(?)๋„ ๊ณ„์‚ฐํ•ด๋ณด๊ณ ,,,

  • ์ƒ์œ„ 5๊ฐœ ํŒ€๊ณผ ์šฐ๋ฆฌ ํŒ€ ๋น„๊ต
    • games_details ํ…Œ์ด๋ธ”์—์„œ ๊ฐ ๊ฒŒ์ž„๋‹น ํŒ€๋ณ„ ๊ธฐ๋ก(?)๊ฐ’์„ ๋”ํ•˜๊ณ 
    • game_id์™€ ๊ฒฐํ•ฉํ•˜์—ฌ ๋‹ค์‹œ ํŒ€๋ณ„๋กœ ๋ชจ๋“  ๊ฒŒ์ž„์˜ ๊ธฐ๋ก์„ ๋”ํ•œ ๋’ค ์ „์ฒด ๊ฒฝ๊ธฐ ์ˆ˜๋กœ ๋‚˜๋ˆˆ๋‹ค(?)

games_details ํ…Œ์ด๋ธ” ์ „์ฒ˜๋ฆฌ

games_details = pd.read_csv('data/games_details.csv')

#games_details['MIN'] = (pd.to_timedelta("00:" + games_details['MIN'].astype(str), errors='coerce').dt.total_seconds() / 60).round(2)

games_details['MIN'].isna().sum()

games_details.head()

games_details.shape

games_details ํ…Œ์ด๋ธ”์—์„œ ํ•„์š”ํ•œ ์ปฌ๋Ÿผ๋งŒ ์ถ”์ถœ

games_details = games_details.drop(['TEAM_ABBREVIATION','NICKNAME','COMMENT','FG3_PCT','FT_PCT'],axis=1)

2014~2022 ์‹œ์ฆŒ๋งŒ ์ถ”์ถœํ•œ GAME_ID ํ…Œ์ด๋ธ”์—์„œ GAME_ID๋ฅผ ์ค‘๋ณต์ œ๊ฑฐํ•˜์—ฌ ๋ฆฌ์ŠคํŠธ๋กœ ๊ฐ€์ ธ์˜ด

game_ids = games['GAME_ID'].unique().tolist()
len(game_ids)

games_details ํ…Œ์ด๋ธ”์—์„œ ์‹œ์ฆŒ 2014~2022์— ํ•ด๋‹นํ•˜๋Š” GAME_ID๋งŒ์„ ํ•„ํ„ฐ๋ง

games_details = games_details[games_details['GAME_ID'].isin(game_ids)]
print(games_details.shape)
display(games_details.head())

games_details.isna().sum()

์ฝ”์น˜๋„ ๋ชจ๋‘ ์ œ๊ฑฐ (๊ธฐ๋ก์—์„œ ๊ฒฐ์ธก์น˜๊ฐ€ ์žˆ๋Š” ํ–‰์€ ๋ชจ๋‘ ์ฝ”์น˜์ธ ๊ฑธ ํ™•์ธ!)

games_details = games_details.dropna(subset=['FGM'])

games_details.shape

games_details.isna().sum()

MIN ์ปฌ๋Ÿผ ์ „์ฒ˜๋ฆฌ

์ด๊ฑธ ์šฐ์งœ์ง€....................

games_details['MIN'][422]

์ง€ํ”ผํ‹ฐ ๊ฐ€๋ผ์‚ฌ๋Œ€.... ๊ฒฐ๊ตญ ๊ตด๋ณตํ–ˆ๋‹ค....

games_details['MIN'] = (pd.to_timedelta("00:" + games_details['MIN'].astype(str), errors='coerce').dt.total_seconds() / 60).round(2)

games_details['MIN']

min_list = []
for i in games_details['MIN']:

# ':'์ด ์žˆ๋Š” ๊ฒฝ์šฐ -> split
if ':' in i:
    x = i.split(':')
    m = int(float(x[0]))
    s = int(float(x[1]))
else:
    # ':'์ด ์—†๋Š” ๊ฒฝ์šฐ -> ์ดˆ๋Š” 0
    m = int(float(i))
    s = 0
min_list.append([m,s])

games_details['MIN'] = min_list
games_details

games_details['MIN']

#games_details['MIN'] = 1pd.DataFrame(total_sec)
games_details['MIN'].isna().sum()

games_details[games_details['MIN'].isna()].index

games_details['MIN'][422]

๊ฐ ํŒ€์˜ ์ด ๊ฒฝ๊ธฐ ์ˆ˜

games_details.groupby('TEAM_ID')['GAME_ID'].nunique()

games_details.groupby(['TEAM_ID',''])['GAME_ID'].nunique()

์ผ๋‹จ ํŒ€์›๋“ค๋ผ๋ฆฌ ์ „์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ์‹์„ ํ†ต์ผํ•˜๊ณ ,
๋™์ผํ•œ ์ฝ”๋“œ๋ฅผ ๊ณต์œ ํ•˜๊ธฐ

games_details ๋ฐ์ดํ„ฐ๋ฅผ GAME_ID, TEAM_ID ๊ธฐ์ค€์œผ๋กœ ๊ทธ๋ฃนํ™”

team_details = games_details.groupby(['GAME_ID','TEAM_ID','TEAM_CITY'])[['FGM','FGA','FG3M','FG3A','FTM','FTA','OREB','DREB','REB', 'AST', 'STL', 'BLK', 'TO','PF','PTS']].sum().reset_index()

team_details

0๊ฐœ์˜ ๋Œ“๊ธ€