๋ ์ง(date)์ ์๊ฐ(time)์ ์กฐํฉํด ์ฒ๋ฆฌํ๋ Python ํ์ค ๋ผ์ด๋ธ๋ฌ๋ฆฌ.
datetime ๋ชจ๋์ ์ฌ์ฉํ๋ฉด ์ฐ๋, ์, ์ผ, ์, ๋ถ, ์ด ๋จ์์ ๋ฐ์ดํฐ๋ฅผ ์ ๋ฐํ๊ฒ ๋ค๋ฃฐ ์ ์๋ค.
import datetime as dt
# ๋ ์ง ์ ์
my_date = dt.date(2020, 3, 22)
print(my_date) # 2020-03-22
print(type(my_date)) # <class 'datetime.date'>
# ๋ ์ง + ์๊ฐ
my_datetime = dt.datetime(2020, 3, 22, 8, 20, 50)
print(my_datetime) # 2020-03-22 08:20:50
print(my_datetime.hour) # 8
print(my_datetime.minute) # 20
ํต์ฌ ๊ฐ๋
date โ ๋ ์ง๋ง ํฌํจdatetime โ ๋ ์ง + ์๊ฐ ๋ชจ๋ ํฌํจ.year, .month, .day, .hour, .minute ๋ฑ์ ์์ฑ์ผ๋ก ๊ตฌ์ฑ์์ ์ ๊ทผ ๊ฐ๋ฅ# datetime โ ๋ฌธ์์ด
str(my_datetime) # '2020-03-22 08:20:50'
# ๋ฌธ์์ด โ datetime
converted = dt.datetime.strptime('2020-03-22', '%Y-%m-%d')
print(converted)
import calendar
print(calendar.month(2021, 3))
import pandas as pd
dates = pd.Series(['2020/03/22', '2020-08-25', 'March 22nd, 2020'])
pd.to_datetime(dates)
๋ค์ํ ํ์์ ๋ฌธ์์ด๋ ์ผ๊ด๋
datetime64ํ์์ผ๋ก ์๋ ๋ณํ๋๋ค.
# Timestamp ์ ์
ts = pd.Timestamp(2020, 3, 22, 10)
print(ts)
# ๋ ์ง ์ฐจ์ด ๊ณ์ฐ
day_1 = pd.Timestamp(1998, 3, 22)
day_2 = pd.Timestamp(2021, 3, 22)
print(day_2 - day_1) # 8401 days
DatetimeIndex ๋ง๋ค๊ธฐ
dates_list = [
dt.date(2020, 3, 22),
dt.date(2020, 4, 22),
dt.date(2020, 5, 22)
]
date_index = pd.DatetimeIndex(dates_list)
sales = [50000, 65000, 72000]
sales_series = pd.Series(data=sales, index=date_index)
print(sales_series)
์ด์ ์๊ณ์ด ๋ฐ์ดํฐ ๋ถ์์ด ๊ฐ๋ฅํ ํํ๊ฐ ๋๋ค.
my_days = pd.date_range(start='2020-01-01', end='2020-04-01', freq='D')
print(len(my_days)) # 92์ผ
์ฃผ์ ์ต์
'D': ์ผ ๋จ์'M': ์ ๋ง์ผ'B': ํ์ผ(์์
์ผ)๋ง'W': ์ฃผ ๋จ์'Q': ๋ถ๊ธฐ ๋จ์business_days = pd.date_range('2020-01-01', '2020-04-01', freq='B')
print(business_days)
avocado_df = pd.read_csv('avocado.csv')
avocado_df['Date'] = pd.to_datetime(avocado_df['Date'])
avocado_df.set_index('Date', inplace=True)
avocado_df.info()
์ถ๋ ฅ ์์
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 18249 entries
Columns: [AveragePrice, Total Volume, type, region]
# ๋จ์ผ ๋ ์ง
avocado_df.loc['2018-01-21']
# ๊ธฐ๊ฐ ํํฐ๋ง
avocado_df.loc['2015-01-04':'2015-01-25']
avocado_df.sort_index(inplace=True)
# ํน์ ๊ธฐ๊ฐ๋ง ์ถ์ถ
trimmed = avocado_df.truncate(before='2017-01-01', after='2018-02-01')
avocado_df.index = avocado_df.index + pd.DateOffset(months=12, days=30)
โ ์ ์ฒด ์๊ณ์ด์ ์ผ์ ๊ธฐ๊ฐ๋งํผ ์ด๋์ํด.
โ ์ดํ -pd.DateOffset(...)์ผ๋ก ์๋ณต ๊ฐ๋ฅ.
| Rule | ์๋ฏธ | ์์ |
|---|---|---|
'A' | ์ฐ๋๋ณ | .resample('A').mean() |
'Q' | ๋ถ๊ธฐ๋ณ | .resample('Q').mean() |
'M' | ์๋ณ | .resample('M').mean() |
'W' | ์ฃผ๋ณ | .resample('W').mean() |
# ์ฐ๋๋ณ ํ๊ท ๊ฐ
avocado_df.resample('A').mean()['AveragePrice']
# ๋ถ๊ธฐ๋ณ ์ต๋๊ฐ
avocado_df.resample('Q').max()['AveragePrice']
low_price = avocado_df['AveragePrice'].where(avocado_df['AveragePrice'] < 1.2)
# ์๋ณ 1.5 ๋ฏธ๋ง ํ๊ท ๊ฐ ๊ฐ์
(avocado_df['AveragePrice'] < 1.5).resample('M').sum()
avocado_df.reset_index(inplace=True)
avocado_df['Day'] = avocado_df['Date'].dt.day
avocado_df['Month'] = avocado_df['Date'].dt.month
avocado_df['Year'] = avocado_df['Date'].dt.year
avocado_df.set_index('Date', inplace=True)
avocado_df.resample('M').mean()['AveragePrice'].plot(
figsize=(10,5),
marker='o',
color='r',
title='์๋ณ ์๋ณด์นด๋ ํ๊ท ๊ฐ๊ฒฉ ์ถ์ด'
)
avocado_df.resample('Q').mean()['AveragePrice'].plot(
figsize=(10,5),
marker='o',
color='r',
title='๋ถ๊ธฐ๋ณ ์๋ณด์นด๋ ํ๊ท ๊ฐ๊ฒฉ'
)
avocado_df.resample('A').mean()['AveragePrice'].plot(
figsize=(10,5),
marker='o',
color='b',
title='์ฐ๋๋ณ ์๋ณด์นด๋ ํ๊ท ๊ฐ๊ฒฉ'
)
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(10,7))
sns.violinplot(
x='type',
y='AveragePrice',
data=avocado_df,
palette='Set2'
)
๊ฒฐ๊ณผ:
์ ๊ธฐ๋(organic) ์๋ณด์นด๋์ ํ๊ท ๊ฐ๊ฐ ์ผ๋ฐ(conventional)๋ณด๋ค ๋์์ ์ง๊ด์ ์ผ๋ก ํ์ธ ๊ฐ๋ฅ.
plt.figure(figsize=(13,6))
sns.histplot(avocado_df['AveragePrice'], kde=True, color='steelblue')
sns.catplot(
x='AveragePrice',
y='region',
hue='Year',
data=avocado_df[avocado_df['type']=='conventional'],
height=10
)
โ ์ง์ญยท์ฐ๋ยท์ ํ๋ณ ๊ฐ๊ฒฉ ์ฐจ์ด๋ฅผ ํ๋์ ํ์ธ ๊ฐ๋ฅ.
์ํ๋์์ค์ฝ, ์์นด๊ณ ๋ฑ ์ฃผ์ ๋์์ ๊ฐ๊ฒฉ ๊ตฌ์กฐ๋ฅผ ์๊ฐ์ ์ผ๋ก ๋น๊ต.
avocado_df['AveragePrice'].resample('W').mean().plot(
figsize=(10,5),
marker='o',
color='r',
title='์ฃผ๊ฐ ํ๊ท ๊ฐ๊ฒฉ ํธ๋ ๋'
)
sns.catplot(
x='AveragePrice',
y='region',
hue='Year',
data=avocado_df[avocado_df['type']=='organic'],
height=10
)
| ์น์ | ์ฃผ์ ํ์ต ๋ด์ฉ | ํต์ฌ ์ฝ๋ |
|---|---|---|
| 1 | Python Datetime ๊ธฐ์ด | datetime.date(), .datetime() |
| 2 | Pandas Timestamp / DatetimeIndex | pd.to_datetime, pd.DatetimeIndex |
| 3 | ๋ ์ง ๋ฒ์ ๋ฐ ์ฃผ๊ธฐ | pd.date_range(freq='M') |
| 4 | ์ค์ ๋ฐ์ดํฐ ๋ก๋ฉ | pd.read_csv, .set_index('Date') |
| 5 | Resample ๊ธฐ๋ฐ ์ง๊ณ | .resample('Q').mean() |
| 6 | ๋ ์ง ์์ ๋ถ๋ฆฌ | .dt.day, .dt.month, .dt.year |
| 7 | Matplotlib ์๊ฐํ | .plot(marker='o') |
| 8 | Seaborn ๊ณ ๊ธ ์๊ฐํ | sns.violinplot, sns.catplot |
์ ๋ฆฌ