Time Series Data(주식 가격 예측 - 2)

안동균·2024년 12월 12일

AIFFEL Data scientist python

Time Series

목록 보기

6/11

Stationary 데이터 가공

1. 기존의 데이터 가공

2. 시계열 분해 기법 적용

Log transform

분산이 점점 커지는 경우에 사용

ts_log = np.log(ts2)
plt.plot(ts_log)

augmented_dickey_fuller_test(ts_log)

p-value가 절반이상으로 줄어들었음

Moving Average 제거

추세를 제거 하기 위함

추세 : 시간에 따라 나타나는 평균값 변화량

moving_avg = ts_log.rolling(window=12).mean()  # moving average구하기 
plt.plot(ts_log)
plt.plot(moving_avg, color='red')

현재 추세는 증가하고 있음

이동평균을 이용하여 전처리

ts_log_moving_avg = ts_log - moving_avg # 변화량 제거
ts_log_moving_avg.dropna(inplace=True)
ts_log_moving_avg.head(15)

해당 과정에서 결측치가 존재할 수 있으므로 제거해야함

plot_rolling_statistics(ts_log_moving_avg)

augmented_dickey_fuller_test(ts_log_moving_avg)

p-value가 0.02 수준으로 안정적인 시계열 데이터를 만들었음

But. Moving Average를 12로 해주어야만 안정적임

moving_avg_6 = ts_log.rolling(window=6).mean()
ts_log_moving_avg_6 = ts_log - moving_avg_6
ts_log_moving_avg_6.dropna(inplace=True)

Moving Avearge를 다른 값으로 설정할 경우 p-value가 올라가 안정적인 데이터가 아니게 됨

계절성 상쇄

시계열 데이터 안에 포함된 패턴이 파악되지 않는 주기적 변화(계절성)

차분을 이용하여 해결

차분 : 현재 시간 데이터 - 과거 시간 데이터

ts_log_moving_avg_shift = ts_log_moving_avg.shift(-1)
ts_log_moving_avg_diff = ts_log_moving_avg - ts_log_moving_avg_shift

ts_log_moving_avg_diff.dropna(inplace=True)
plot_rolling_statistics(ts_log_moving_avg_diff)

augmented_dickey_fuller_test(ts_log_moving_avg_diff)

p-value가 확연하게 줄어든 것을 볼 수 있음

시계열 분해

시계열 안에 존재하는 Trend, Seasonality를 분리해 낼수 있는 기능이 존재

from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(ts_log)

trend = decomposition.trend # 추세(시간 추이에 따라 나타나는 평균값 변화 )
seasonal = decomposition.seasonal # 계절성(패턴이 파악되지 않은 주기적 변화)
residual = decomposition.resid # 원본(로그변환한) - 추세 - 계절성

plt.rcParams["figure.figsize"] = (11,6)
plt.subplot(411)
plt.plot(ts_log, label='Original')
plt.legend(loc='best')
plt.subplot(412)
plt.plot(trend, label='Trend')
plt.legend(loc='best')
plt.subplot(413)
plt.plot(seasonal,label='Seasonality')
plt.legend(loc='best')
plt.subplot(414)
plt.plot(residual, label='Residuals')
plt.legend(loc='best')
plt.tight_layout()

Residuals : Trend(추세), Seasonality(계절성)를 제거하고 남은 데이터

plt.rcParams["figure.figsize"] = (13,6)
plot_rolling_statistics(residual)

residual.dropna(inplace=True)
augmented_dickey_fuller_test(residual)

압도적으로 낮은 성능을 보여주고 있음

안동균

이전 포스트

Time Series Data(주식 가격 예측 - 1)

다음 포스트

Time Series Data(주식 가격 예측 - 2)

Time Series

Stationary 데이터 가공

1. 기존의 데이터 가공

2. 시계열 분해 기법 적용

Log transform

분산이 점점 커지는 경우에 사용

p-value가 절반이상으로 줄어들었음

Moving Average 제거

추세를 제거 하기 위함

추세 : 시간에 따라 나타나는 평균값 변화량

현재 추세는 증가하고 있음

이동평균을 이용하여 전처리

해당 과정에서 결측치가 존재할 수 있으므로 제거해야함

p-value가 0.02 수준으로 안정적인 시계열 데이터를 만들었음

But. Moving Average를 12로 해주어야만 안정적임

Moving Avearge를 다른 값으로 설정할 경우 p-value가 올라가 안정적인 데이터가 아니게 됨

계절성 상쇄

시계열 데이터 안에 포함된 패턴이 파악되지 않는 주기적 변화(계절성)

차분을 이용하여 해결

차분 : 현재 시간 데이터 - 과거 시간 데이터

p-value가 확연하게 줄어든 것을 볼 수 있음

시계열 분해

시계열 안에 존재하는 Trend, Seasonality를 분리해 낼수 있는 기능이 존재

Residuals : Trend(추세), Seasonality(계절성)를 제거하고 남은 데이터

압도적으로 낮은 성능을 보여주고 있음

Time Series Data(주식 가격 예측 - 1)

Time Series Data(주식 가격 예측 - 3)

0개의 댓글