[Time-series] KETI/태양광,풍력에너지 발전량 예측 모델 개발

Park jong ho·2022년 8월 26일
0

Competitions

목록 보기
1/2
post-thumbnail

Overview

22.07.21-22.08.21 KETI 주관 에너지 인공지능 경진대회
재생에너지 발전량 예측제도: 재생에너지 발전량을 하루 전에 미리 예측, 당일날 일정 오차율 이내로 이행할 경우 정산금을 지급하는 제도

Data:
한국서부발전(주) 제공 10분 간격 태양광/풍력 발전량
기상관측/기상예보.csv(온도, 풍속, 풍향, 습도)
Object:
17시 이전 데이터를 기반으로 다음날 시간 단위 발전량 예측 모델 개발
Evaluate:
Nominal Mean Absolute Error >> nominal value 이상인 항을 대상으로 MAE

Data format

solar_weather.csv

지점지점명Date온도풍속풍향습도

solar_forecast.csv

Forecast timeforecasttemperaturewindspeedwinddirectionhumidity
2022-08-25 14:00:007------------
2022-08-25 14:00:0010------------
2022-08-25 14:00:0013------------

solar_power.csv

datetimetarget
2022-08-25 00:00:10---
2022-08-25 00:00:20---

데이터 보안 서약에 의해 데이터 visualize 제공 불가

Solar Power

Analyze

태양광 발전량의 경우 1시간 단위 평균값을 분석 결과, 다음의 insight를 얻음.

  • 시간 주기성
    배경지식과 같이 일출-일몰시간에만 발전량이 측정됨.
  • 계절 주기성
    여름에 측정시간이 길고 발전량 고점이 높음 반대로 겨울에 측정시간이 짧고 발전량 고점이 낮음.
  • 기상 상황
    배경지식에 기반해 흐린 날의 경우 발전량이 적게 측정될 것이라 판단.
  • 태양 고도
    지표면과 태양이 이루는 각이 높을수록 태양복사량이 강함.
  • 태양 방위각
    태양 방위각 차이가 연주기성을 가짐.
  • 이슬점
    이슬점과 태양광 발전량에 상관관계가 존재한다는 리서치가 존재, 온도/습도로 근사 가능.

Data Preprocess

## solar_forecast > 특정 시간에 예측한 n시간 뒤의 기상상황
## solar_weather  > 실제 관측된 기상상황
## solar_power > 10분 간격 태양광발전량

solar_forecast = pd.read_csv('/content/solar_forecast_weather.csv')
solar_weather = pd.read_csv('/content/weather_solar_actual.csv')
solar_power = pd.read_csv('/content/solar_power_2204.csv')

태양광 발전은 Day주기성이 강하기 때문에 결측치가 있다고 판단되는 Date 전체를 삭제하고 학습.
태양광 발전은 일출전, 일몰후에는 발전량이 0으로 관측.
해가 떠있는 시간에 0으로 관측된 발전량은 결측치로 판단.
해당 날짜에 발전량이 0인 항 개수를 기준으로 결측 Date를 걸러냄.

## 발전량 전처리

solar_power = solar_power.dropna()

idx_outlier = solar_power[solar_power['target'] > 20000].index
solar_power = solar_power.drop(idx_outlier)

solar_power['datetime'] = pd.to_datetime(solar_power['datetime'])
solar_power['date'] = solar_power['datetime'].dt.date
# 특정 Date에 관측된 발전량 중에 0인 항의 개수를 기준으로 outlier Date filter
# 하루를 10분 간격으로 나누면 144개의 항이 존재, 정상 Date의 통계: 51~83
non_zero_group = solar_power.groupby('date').apply(lambda x: sum(x.target != 0))
#non_zero_group.quantile(0.05) == 42
normal_date = non_zero_group.apply(lambda x: 48<x<83)
unique_date = solar_power['date'].unique()


normal_list =[]

for idx, d in enumerate(normal_date):
  if d == True:
    normal_list.append(unique_date[idx])
# 추가로 관찰한 outlier Date 삭제
normal_list.remove(datetime.date(2021,7,12))

normal_idx = solar_power['date'].isin(normal_list)
solar_power = solar_power.loc[normal_idx]
## 발전량 10분 간격 > 1시간 평균

solar_power.set_index(pd.to_datetime(solar_power['datetime']), inplace=True)
solar_power_hourmean = solar_power.resample(rule='1H').mean()
solar_power_hourmean = solar_power_hourmean.reset_index()
solar_power_hourmean = solar_power_hourmean.dropna()
## 기상 관측치 전처리

solar_weather = solar_weather.drop(columns = ['지점'],axis=1)
solar_weather = solar_weather.drop(columns =['지점명'], axis=1)

solar_weather.columns =['Date', 'temperature','windspeed','winddirection','humidity']

solar_weather['datetime'] = pd.to_datetime(solar_weather['Date'])
solar_weather['date'] = solar_weather['datetime'].dt.date
# 발전량에서 사용한 date만 filter
normal_idx = solar_weather['date'].isin(normal_list)
solar_weather = solar_weather.loc[normal_idx]
## 변수 추가/삭제 (month, season, con_hour, con_date)

train_data = solar_weather.copy()
# 풍향 변수 삭제
train_data = train_data.drop(columns =['winddirection','Date'])

hour = train_data['datetime'].dt.hour
month = train_data['datetime'].dt.month 
month_featured = train_data['datetime'].dt.month - 1
day = train_data['datetime'].dt.day /31
date = month_featured + day
# 계절 category 변수 생성
seasons = ['1','2','2','2','3','3','3','2','2','2','1','1']
month_to_season = dict(zip(range(1,13),seasons))
# 월 변수 생성
train_data['month'] = month
train_data['season'] = train_data['month'].map(month_to_season)
# continuous hour 변수 생성 > 시간 주기성 부여
# continuous date 변수 생성 > 월 주기성 부여
train_data['con_hour'] = -24*np.cos(2*np.pi*(hour/24))
train_data['con_date'] = -np.cos(2*np.pi*(date/12))

train_data.reset_index(drop=True, inplace=True)
## 기상예보 전처리
# 17시 이전 데이터 중 가장 최신 데이터인 14시 예보 사용
# 14시 기준 다음날 0시~24시 데이터만 지정
cond1 = solar_forecast['Forecast time'].str.endswith('14:00:00')
cond2 = solar_forecast['forecast'] >=10 
cond3 = solar_forecast['forecast'] <= 33
solar_forecast = solar_forecast.loc[cond1&cond2&cond3]
solar_forecast = solar_forecast.reset_index(drop=True)
solar_forecast['Forecast time'] = pd.to_datetime(solar_forecast['Forecast time'])
solar_forecast['forecasted_datetime'] = solar_forecast['Forecast time'] + solar_forecast['forecast'].map(lambda x: pd.DateOffset(hours=x))
solar_forecast.drop(columns =['Forecast time'], inplace=True)
# 3시간 간격 예보 1시간 간격 보간
datetime_from = solar_forecast['forecasted_datetime'].values[0]
datetime_to = solar_forecast['forecasted_datetime'].values[-1]
val_data = pd.DataFrame(pd.date_range(start=datetime_from, end=datetime_to, freq='H'), columns = ['forecasted_datetime'])
val_data = pd.merge(solar_forecast, val_data, on='forecasted_datetime', how='right')
val_data['temperature'] = val_data['temperature'].interpolate()
val_data['humidity'] = val_data['humidity'].interpolate()
val_data['windspeed'] = val_data['windspeed'].interpolate()
val_data['date'] = val_data['forecasted_datetime'].dt.date
# 변수 생성 (month, season, con_hour, con_date)
val_data['month'] = val_data['forecasted_datetime'].dt.month
val_data['season'] = val_data['month'].map(month_to_season)
val_data['con_hour'] = -24*np.cos(2*np.pi*(val_data['forecasted_datetime'].dt.hour/24))
val_data['con_date'] = -np.cos(2*np.pi*((val_data['forecasted_datetime'].dt.month-1) + (val_data['forecasted_datetime'].dt.day /31))/12)
# test_data 생성
test_data = val_data.copy()
normal_idx = val_data['date'].isin(normal_list)
val_data = val_data.loc[normal_idx]
val_data.drop(columns = ['winddirection', 'forecast'],inplace=True)
val_data = val_data[['forecasted_datetime','temperature','windspeed','humidity','month','season','con_hour','con_date']]
val_data = val_data.rename(columns={'forecasted_datetime': 'datetime'})
val_data.reset_index(drop=True, inplace=True)

test_data.drop(columns = ['winddirection', 'forecast'],inplace=True)
test_data = test_data[['forecasted_datetime','temperature','windspeed','humidity','month','season','con_hour','con_date']]
test_data = test_data.rename(columns={'forecasted_datetime': 'datetime'})
test_data.reset_index(drop=True, inplace=True)
## 22.05 ~ 22.06 예측대상

test1 = test_data['datetime'] >'2022-04-27 23:00:00'
test2 = test_data['datetime'] < '2022-07-01 00:00:00'

test_data = test_data.loc[test1 & test2]
test_data.reset_index(drop=True, inplace=True)
## 변수 추가 (dew_point)

c = 243.12
b = 17.62
gamma = (b * (train_data['temperature']) / (c + (train_data['temperature']))) + np.log(train_data['humidity'] / 100)
dp = ( c * gamma) / (b - gamma)
train_data['dew_point'] = dp

gamma = (b * (val_data['temperature']) / (c + (val_data['temperature']))) + np.log(val_data['humidity'] / 100)
dp = ( c * gamma) / (b - gamma)
val_data['dew_point'] = dp

gamma = (b * (test_data['temperature']) / (c + (test_data['temperature']))) + np.log(test_data['humidity'] / 100)
dp = ( c * gamma) / (b - gamma)
test_data['dew_point'] = dp

기상예보 데이터 추가(하늘상태, 강수확률)

기상청 제공 공공데이터 초단기예보 활용
하늘상태: {맑음: 1, 구름많음: 3, 흐림: 4}
강수확률: 0~100 %

Raw Data format

format: dayhourforecastvalue
dayhour3시간간격(Bf 21.06)/1시간간격(Af 21.06)value

day 갱신 때, Start: Date 열로 구분됨.
2021-06월 이전/이후로 예보 간격이 다름.(3시간/1시간)

## 기상청 Raw data 전처리

def forecast_preprocess(fcst_df, st_month, st_year):
  month_rows = []
  month_rows.extend(fcst_df[fcst_df['hour'].isna()].index)
  month_rows.append(fcst_df.shape[0]+1)

  month_data = []
  for i in range(len(month_rows)-1):
    month_data.append(fcst_df.loc[month_rows[i]+1:month_rows[i+1]-1])

  month = st_month
  year = st_year
  out = pd.DataFrame()

  for i,df in enumerate(month_data):
    month += 1
    if month > 12:
      month = month % 12
      year += 1
    date = f'{year}-{month}-'+df[' format: day']+' '+(df['hour'].astype(int)//100).astype(str) + ':00'
    date = pd.to_datetime(date) + pd.DateOffset(hours=9)
    df['datetime'] = date
    month_sky = pd.DataFrame(columns=['Forecast_time','forecast','skystate'])
    month_sky = df[['datetime','forecast','value']]
    out = pd.concat([out, month_sky])

  return out

## 21.06 이전 3시간 보간
def fcst_3step_interpolate(df, type):
  df = df[df['forecast']==4]
  df['forecasted_datetime'] = pd.to_datetime(df['datetime']) + df['forecast'].map(lambda x: pd.DateOffset(hours=x))
  df = df[['forecasted_datetime','value']]
  datetime_from = df['forecasted_datetime'].values[0]
  datetime_to = df['forecasted_datetime'].values[-1]
  interpolate = pd.DataFrame(pd.date_range(start=datetime_from, end=datetime_to, freq='H'), columns = ['forecasted_datetime'])
  interpolate = pd.merge(df, interpolate, on='forecasted_datetime', how='right')
  interpolate['value'] = interpolate['value'].interpolate(type)
  interpolate.reset_index(drop=True, inplace=True)
  return interpolate
  
## 21.06 이후 예보값
def fcst_1step(df):
  df = df[df['forecast'].isin([6,7,8])]
  df['forecasted_datetime'] = pd.to_datetime(df['datetime']) + df['forecast'].map(lambda x: pd.DateOffset(hours=x))
  df = df[['forecasted_datetime','value']]
  df.reset_index(drop=True, inplace=True)
  return df

## 22.05/22.06 예보값 
def test_set(df):
  df['datetime'] = df['datetime'].astype(str)
  cond1 = df['datetime'].str.endswith('14:00:00')
  cond2 = df['forecast'] >= 10
  cond3 = df['forecast'] <= 33
  test_df = df.loc[cond1&cond2&cond3]
  test_df['datetime'] = pd.to_datetime(test_df['datetime'])
  test_df['forecasted_datetime'] = test_df['datetime'] + test_df['forecast'].map(lambda x: pd.DateOffset(hours=x))
  test_df.drop(columns=['datetime'],inplace=True)
  #72step CNN모델을 사용하기에 3일전 데이터부터 포함
  cond4 = test_df['forecasted_datetime'] >= '2022-04-28 00:00:00'
  cond5 = test_df['forecasted_datetime'] < '2022-07-01 00:00:00'
  test_df = test_df.loc[cond4 & cond5]
  test_df = test_df[['forecasted_datetime','value']]
  test_df.reset_index(drop=True, inplace=True)
  return test_df
## sky_state: nearest 보간/ rain_prob: linear 보간

sky_state1 = fcst_3step_interpolate(sky_state, 'nearest')
rain_prob1 = fcst_3step_interpolate(rain_prob,'linear')
sky_state2 = fcst_1step(sky_state2)
rain_prob2 = fcst_1step(rain_prob2)

sky_state1 = sky_state1[sky_state1['forecasted_datetime'] < '2021-06-30 08:00:00']
sky_state2 = sky_state2[sky_state2['forecasted_datetime'] < '2022-05-01 00:00:00']
sky_concat = pd.concat([sky_state1, sky_state2])
sky_concat.reset_index(drop=True, inplace=True)
rain_prob1 = rain_prob1[rain_prob1['forecasted_datetime'] < '2021-07-01 17:00:00']
rain_prob2 = rain_prob2[rain_prob2['forecasted_datetime'] < '2022-05-01 00:00:00']
rain_concat = pd.concat([rain_prob1, rain_prob2])
rain_concat.reset_index(drop=True, inplace=True)

sky_state1 = sky_state1[sky_state1['forecasted_datetime'] < '2021-06-30 08:00:00']
sky_state2 = sky_state2[sky_state2['forecasted_datetime'] < '2022-05-01 00:00:00']
sky_concat = pd.concat([sky_state1, sky_state2])
sky_concat.reset_index(drop=True, inplace=True)
rain_prob1 = rain_prob1[rain_prob1['forecasted_datetime'] < '2021-07-01 17:00:00']
rain_prob2 = rain_prob2[rain_prob2['forecasted_datetime'] < '2022-05-01 00:00:00']
rain_concat = pd.concat([rain_prob1, rain_prob2])
rain_concat.reset_index(drop=True, inplace=True)

solar_ss_test = test_set(sky_state2)
solar_rp_test = test_set(rain_prob2)
## 변수 추가 (sky_state, rain_probability)
# 사용할 date만 선택
solar_ss['date'] = solar_ss['forecasted_datetime'].dt.date
index = solar_ss['date'].isin(normal_list)
solar_ss_st = solar_ss.loc[index]
solar_ss_st.drop(columns='date',inplace=True)
solar_ss_st.reset_index(drop=True, inplace=True)

solar_rp['date'] = solar_rp['forecasted_datetime'].dt.date
index = solar_rp['date'].isin(normal_list)
solar_rp_st = solar_rp.loc[index]
solar_rp_st.drop(columns='date',inplace=True)
solar_rp_st.reset_index(drop=True, inplace=True)

train_data['ss']= solar_ss_st['value']
train_data['rp']= solar_rp_st['value']
val_data['ss']= solar_ss_st['value']
val_data['rp'] = solar_rp_st['value']
test_data['ss'] = test_solar_ss['value']
test_data['rp'] = test_solar_rp['value']

태양 데이터 추가(고도, 방위각)
한국천문연구원 제공 태양데이터 api 활용

## api활용 태양데이터 load

import matplotlib.pyplot as plt
import pandas as pd
import urllib
import urllib.request
import json
import xmltodict

from tqdm import tqdm

def return_solar_info(date):
    key='데이터 이용 요청시 받는 개인access key'
    url = 'http://apis.data.go.kr/B090041/openapi/service/SrAltudeInfoService/getLCSrAltudeInfo'
    queryParams = '?' + urllib.parse.urlencode(
        {
            urllib.parse.quote_plus('ServiceKey') : key, 
            urllib.parse.quote_plus('latitude') : '37.535911',
            urllib.parse.quote_plus('longitude') : '126.602342',
            urllib.parse.quote_plus('locdate') : date, 
            urllib.parse.quote_plus('dnYn') : 'Y', 
        }
    )

    response = urllib.request.urlopen(url + queryParams).read()
    dict_type = xmltodict.parse(response)
    return dict_type['response']['body']['items']

def angle_to_float(value):
    value = value.split('˚')
    if value[0][0] == '-':
      value = 0
    else:
      value = int(value[0]) + int(value[1].split('´')[0])*0.01
    return value
#Ex) date: 2022-08-25 00:00:00 > 20220825
date = date.apply(lambda x: x[:10].replace('-','')).unique()
solar_infos =[]
for date in tqdm(date):
  solar_infos.append(return_solar_info(date)['item'])
# 남중고도, 09시, 12시, 15시, 18시 남중고도/방위각
angle_columns=['altitudeMeridian','altitude_09','altitude_12','altitude_15','altitude_18','azimuth_09','azimuth_12','azimuth_15','azimuth_18']

date_solars ={}
for i,d in enumerate(solar_infos):
  tmp_solars=[]
  for col in angle_columns:
    tmp_solars.append(angle_to_float(d[col]))
  date_solars[d['locdate']]= tmp_solars
# solar_info.csv에 저장
date_solars_df=pd.DataFrame(date_solars.values(), columns=angle_columns)
date_solars_df['date'] = date_solars.keys()
date_solars_df.to_csv('solar_info.csv',index=False)
train_data['altitude'] = 0
val_data['altitude'] = 0
test_data['altitude'] = 0

## 3시간 간격 데이터 활용해 변수에 1시간 보간값 저장

solar_info['date'] = pd.to_datetime(solar_info['date'], format='%Y%m%d')
altitude_09 = solar_info['altitude_09']
altitude_12 = solar_info['altitude_12']
altitude_15 = solar_info['altitude_15']
altitude_18 = solar_info['altitude_18']
altitude_10 = altitude_09 + (altitude_12-altitude_09)*1/3
altitude_11 = altitude_09 + (altitude_12-altitude_09)*2/3
altitude_13 = altitude_12 + (altitude_15-altitude_12)*1/3
altitude_14 = altitude_12 + (altitude_15-altitude_12)*2/3
altitude_16 = altitude_15 + (altitude_18-altitude_15)*1/3
altitude_17 = altitude_15 + (altitude_18-altitude_15)*2/3
# altitude 변수 추가
j=0
k=0
l=0
m=0
n=0
o=0
p=0
q=0
r=0
s=0

for i, datetime in enumerate(train_data['datetime']):  
  if datetime.endswith('09:00:00'):
    train_data['altitude'].iloc[i] = altitude_09[j]
    j += 1
  elif datetime.endswith('10:00:00'):
    train_data['altitude'].iloc[i] = altitude_10[k]
    k += 1
  elif datetime.endswith('11:00:00'):
    train_data['altitude'].iloc[i] = altitude_11[l]
    l += 1
  elif datetime.endswith('12:00:00'):
    train_data['altitude'].iloc[i] = altitude_12[m]
    m += 1
  elif datetime.endswith('13:00:00'):
    train_data['altitude'].iloc[i] = altitude_13[n]
    n += 1
  elif datetime.endswith('14:00:00'):
    train_data['altitude'].iloc[i] = altitude_14[o]
    o += 1
  elif datetime.endswith('15:00:00'):
    train_data['altitude'].iloc[i] = altitude_15[p]
    p += 1
  elif datetime.endswith('16:00:00'):
    train_data['altitude'].iloc[i] = altitude_16[q]
    q += 1
  elif datetime.endswith('17:00:00'):
    train_data['altitude'].iloc[i] = altitude_17[r]
    r += 1
  elif datetime.endswith('18:00:00'):
    train_data['altitude'].iloc[i] = altitude_18[s]
    s += 1
## 하루 방위각 편차 변수 추가

train_data['date'] = pd.to_datetime(train_data['date'])
date = solar_info['date']

date_azimuth ={}
for i,dt in enumerate(date):
  date_azimuth[dt] = solar_info['azimuth_diff'].iloc[i]

train_data['azimuth'] = train_data['date'].map(date_azimuth)
# 관측치이기 때문에 val_data에도 똑같이 적용
val_data['altitude'] = train_data['altitude']
val_data['azimuth'] = train_data['azimuth']
## (선택사항) category 변수 onehotencoding
train_data = pd.get_dummies(train_data, columns = ['ss'])
val_data = pd.get_dummies(val_data, columns = ['ss'])

Model

  • Tabnet
    Data의 양과 feature가 작기 때문에 Tabnet을 사용했을 때 좋지 않은 성과
  • Tree model(LGBM,XGBM)
    Optuna를 사용해 parameter 최적화를 진행한 tree model도 오차 15% 결과를 도출
  • CNN-LSTM
    24seq 단위로 학습했으나 Tree model보다 좋은 성과를 내지 못함
  • Conv1d
    예측 시점에서 3일전(73seq) 데이터를 활용해 발전량을 학습, 오차 10% 내외 도출

Train

  • 기상예보값 학습
    최종 test 예측이 기상예보를 바탕으로 하기에 기상예보를 이용해 학습을 시도해봤으나 오차 증가. > 기상 예측값, 기상 관측치 간 상관관계 불일정.
  • 기상관측값 학습
# Tensorflow 사용
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import optimizers
from tensorflow.keras.layers import Dense, Flatten, Conv1D, BatchNormalization, Activation, Dropout, LSTM
from tensorflow.keras import Model
from keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
## 이전 72seq까지를 데이터로 구성

def seq_gen(df, target, seq_length):
  if target is None:
    X=[]
    
    for i in range(0, len(df) - seq_length):
      x= df[i:i+seq_length+1]

      X.append(x)
    return np.array(X)
  
  else:
    X =[]
    Y =[]

    for i in range(0, len(df) - seq_length):
      x = df[i:i+ seq_length+1]
      y = target[i + seq_length]

      X.append(x)
      Y.append(y)

    return np.array(X), np.array(Y)
# 여러 시도 끝에 season 변수를 제외
train_data.drop(columns=['season'],inplace=True)
val_data.drop(columns=['season'],inplace=True)
test_data.drop(columns=['season'],inplace=True)

train_y = solar_power_hourmean['target'].to_numpy()
# 21.05~21.08 기상예측값을 이용해 검증
cond1 = val_data['datetime'] > '2021-04-27 23:00:00'
cond2 = val_data['datetime'] < '2021-09-01 00:00:00'
cond3 = solar_power_hourmean['datetime'] >'2021-04-27 23:00:00'
cond4 = solar_power_hourmean['datetime'] < '2021-09-01 00:00:00'

val_data =  val_data.loc[cond1 & cond2].drop(columns=['datetime'])
val_data.reset_index(drop=True, inplace=True)
val_y = solar_power_hourmean['target'].loc[cond3 & cond4].to_numpy()


train_x, train_y = seq_gen(train_data, train_y,72)
val_x, val_y = seq_gen(val_data, val_y, 72)
test_x = seq_gen(test_data, None, 72)
## 평가 산식

def NMAE(true, pred):

  absolute_error = np.abs(true - pred)
  absolute_error /= capacity

  target_idx = np.where(true >= capacity*0.1)

  nmae = 100*absolute_error[target_idx].mean()
  
  return nmae

def custom_metric(true, pred):
  score = tf.py_function(func=NMAE, inp=[true, pred], Tout=tf.float32, name='nmae')
  return score
# causal padding를 사용해 target이전의 data만 학습에 반영
# 모델 깊이를 가볍게해 기상관측 데이터의 overfit를 제어

def regression_dilated_cnn(Model_input):
    x = Conv1D(8, 3, padding='causal')(Model_input)
    x = BatchNormalization()(x)
    x = Activation(activation='elu')(x)
    x = Dropout(0.5)(x)
    
    x = Conv1D(16, 3, padding='causal')(x)
    x = BatchNormalization()(x)
    x = Activation(activation='elu')(x)
    x = Dropout(0.5)(x)

    x = Conv1D(32, 3, padding='causal',dilation_rate=2)(x)
    x = BatchNormalization()(x)
    x = Activation(activation='elu')(x)
    x = Dropout(0.5)(x)
    
    x = Conv1D(64, 3, padding='causal', dilation_rate=4)(x)
    x = BatchNormalization()(x)
    x = Activation(activation='elu')(x)
    x = Dropout(0.5)(x)

    x = Flatten()(x)
    x  = Dense(32)(x)
    x = Activation(activation='elu')(x)
    x = Dense(1)(x)

    output = Model(Model_input, x, name='regression_dilated_cnn')

    return output
capacity = 1100
# monitor 항목이 7epochs 동안 개선이 안되면 factor만큼 LR 감소
reLR = ReduceLROnPlateau(monitor ='val_custom_metric', patience=7, factor = 0.5)
es = EarlyStopping(monitor='val_custom_metric', patience = 30, restore_best_weights=True)

model_inputs = keras.Input(shape=(73,13))
model = regression_dilated_cnn(model_inputs)
model.compile(loss = 'mae', optimizer = 'RMSprop', metrics=[custom_metric])
history = model.fit(train_x, train_y , epochs =200, validation_data = (val_x, val_y), 
                    batch_size = 64, callbacks =[reLR, es])

9~10% 오차 산출

Wind power

Analyze

  • 계산식
    터빈 특성에서 얻는 출력산출은 유량, 속도^2, 직경^3에 비례
  • 풍속^2
    풍속에 가중치를 더함
  • 풍향
    numeric 변수인 풍향을 category 변수로 변환
  • 시간 주기성
    불명확
  • 바람 벡터
    풍속, 풍향이 가장 큰 상관관계를 가지므로 두 변수를 결합해 바람 벡터 생성

Data Preprocess

Preprocess 체계는 solar power와 거의 동일

wind_forecast = pd.read_csv('/content/wind_forecast_weather.csv')
wind_weather = pd.read_csv('/content/weather_wind_actual.csv')
wind_power = pd.read_csv('/content/wind_2204.csv')

wind_power visualize 결과, 시간 주기성보다 다른 factor가 더 크다고 판단
solar_power와 다르게 발전량, 풍속 결측치 제거

## 발전량, 기상관측값 전처리

wind_power = wind_power.dropna()
wind_power['datetime'] = pd.to_datetime(wind_power['datetime'])
wind_power['target'] = wind_power['target'].apply(lambda x: 0.0 if x < 0.0 else x)
idx_outlier = wind_power[wind_power['target'] > 20000].index
wind_power = wind_power.drop(idx_outlier)
wind_power.reset_index(drop=True, inplace=True)

wind_weather = wind_weather.drop(columns = ['지점'],axis=1)
wind_weather = wind_weather.drop(columns =['지점명'], axis=1)
wind_weather.columns =['datetime', 'temperature','windspeed','winddirection','humidity']
wind_weather.set_index(pd.to_datetime(wind_weather['datetime']), inplace=True)
wind_weather = wind_weather.resample(rule='1H').mean()
wind_weather = wind_weather.fillna(0)
wind_weather.reset_index(inplace=True)

power_zero = wind_power[wind_power['target'] == 0].index
wind_zero = wind_weather[wind_weather['windspeed']==0].index

unique_datetime = pd.to_datetime(wind_power['datetime'].unique())
power_error_datetime = np.array(wind_power.iloc[power_zero]['datetime'])# 10Min단위
wind_error_datetime = np.array(wind_weather.iloc[wind_zero]['datetime'])# 1H단위

target_datetime = np.setdiff1d(unique_datetime,power_error_datetime)
target_idx = wind_power['datetime'].isin(target_datetime)
wind_power = wind_power.loc[target_idx]
wind_power.reset_index(drop=True, inplace=True)

wind_power.set_index(pd.to_datetime(wind_power['datetime']), inplace=True)
wind_power_hourmean = wind_power.resample(rule='1H').mean()
wind_power_hourmean = wind_power_hourmean.dropna()
wind_power_hourmean = wind_power_hourmean.reset_index()

target_idx = ~wind_power_hourmean['datetime'].isin(wind_error_datetime)
wind_power_hourmean = wind_power_hourmean.loc[target_idx]

wind_weather['datetime'] = pd.to_datetime(wind_weather['datetime'])
target_idx = wind_weather['datetime'].isin(wind_power_hourmean['datetime'])
wind_weather = wind_weather.loc[target_idx]
wind_weather.reset_index(drop=True, inplace=True)
## 변수 변환 함수

def angle_to_dir(x):
  if x >= 22.5 and x < 67.5:
      return '1'
  elif x >= 67.5 and x < 112.5:
      return '2'
  elif x >= 112.5 and x < 157.5:
      return '3'
  elif x >= 157.5 and x < 202.5:
      return '4'
  elif x >= 202.5 and x < 247.5:
      return '5'
  elif x >= 247.5 and x < 292.5:
      return '6'
  elif x >= 292.5 and x < 337.5:
      return '7'
  elif x >= 337.5 or x < 22.5:
      return '0'

def angle_to_cos(x):
    return np.cos(np.pi/180*(x+90))
    
def angle_to_sin(x):
    return np.sin(np.pi/180*(x-90))
## 변수 생성

train_data = wind_weather.copy()

hour = train_data['datetime'].dt.hour
month = train_data['datetime'].dt.month
month_featured = month - 1
day = train_data['datetime'].dt.day /31
date = month_featured + day
seasons = ['4','4','4','1','1','1','2','2','2','3','3','3']
month_to_season = dict(zip(range(1,13),seasons))
# 풍향 category 변수 생성
train_data['cat_winddirection'] = train_data['winddirection'].apply(lambda x: angle_to_dir(x))
# 바람 vector 변수 생성
train_data['windxvec'] = train_data['winddirection'].apply(lambda x: angle_to_cos(x)) * train_data['windspeed']
train_data['windyvec'] = train_data['winddirection'].apply(lambda x: angle_to_sin(x)) * train_data['windspeed']
# 월 변수 생성
train_data['month'] = month
# 계절 변수 생성
train_data['season'] = train_data['month'].map(month_to_season)
# continuous hour 변수 생성
train_data['con_hour'] = np.cos(2*np.pi*(hour/24))
# continuous date 변수 생성
train_data['con_date'] = np.cos(2*np.pi*(date/12))
train_data = train_data.drop(columns =['datetime'])
# 풍속에 가중치를 더 주기 위한 풍속 제곱 변수 생성
train_data['windspeed^2'] = train_data['windspeed'] * train_data['windspeed']

train_data.reset_index(drop=True, inplace=True)
## validation, test data 생성

cond1 = wind_forecast['Forecast time'].str.endswith('14:00:00')
cond2 = wind_forecast['forecast'] >=10 
cond3 = wind_forecast['forecast'] <= 33
wind_forecast = wind_forecast.loc[cond1&cond2&cond3]
wind_forecast = wind_forecast.reset_index(drop=True)
wind_forecast['Forecast time'] = pd.to_datetime(wind_forecast['Forecast time'])
wind_forecast['forecasted_datetime'] = wind_forecast['Forecast time'] + wind_forecast['forecast'].map(lambda x: pd.DateOffset(hours=x))
wind_forecast.drop(columns =['Forecast time'], inplace=True)
datetime_from = wind_forecast['forecasted_datetime'].values[0]
datetime_to = wind_forecast['forecasted_datetime'].values[-1]
val_data = pd.DataFrame(pd.date_range(start=datetime_from, end=datetime_to, freq='H'), columns = ['forecasted_datetime'])
val_data = pd.merge(wind_forecast, val_data, on='forecasted_datetime', how='right')
val_data['temperature'] = val_data['temperature'].interpolate()
val_data['humidity'] = val_data['humidity'].interpolate()
val_data['windspeed'] = val_data['windspeed'].interpolate()
val_data['winddirection'] = val_data['winddirection'].interpolate('nearest')
val_data['cat_winddirection'] = val_data['winddirection'].apply(lambda x: angle_to_dir(x))
val_data['windxvec'] = val_data['winddirection'].apply(lambda x: angle_to_cos(x)) * val_data['windspeed']
val_data['windyvec'] = val_data['winddirection'].apply(lambda x: angle_to_sin(x)) * val_data['windspeed']
val_data['month'] = val_data['forecasted_datetime'].dt.month
val_data['season'] = val_data['month'].map(month_to_season)
val_data['con_hour'] = np.cos(2*np.pi*(val_data['forecasted_datetime'].dt.hour/24))
val_data['con_date'] = np.cos(2*np.pi*((val_data['forecasted_datetime'].dt.month-1) + (val_data['forecasted_datetime'].dt.day /31))/12)
val_data['windspeed^2'] = val_data['windspeed'] * val_data['windspeed']

test_data = val_data.copy()
normal_idx = val_data['forecasted_datetime'].isin(wind_power_hourmean['datetime'])
val_data = val_data.loc[normal_idx]
val_data.drop(columns = ['forecast'],inplace=True)
val_data = val_data[['forecasted_datetime','temperature','windspeed','humidity','winddirection','cat_winddirection','windxvec','windyvec','month','season','con_hour','con_date','windspeed^2']]
val_data = val_data.rename(columns={'forecasted_datetime': 'datetime'})
val_data.reset_index(drop=True, inplace=True)
## target 지정

test1 = test_data['datetime'] >'2022-04-30 23:00:00'
test2 = test_data['datetime'] < '2022-07-01 00:00:00'
test_data = test_data.loc[test1 & test2]
test_data.reset_index(drop=True, inplace=True)
index = wind_ss['forecasted_datetime'].isin(wind_power_hourmean['datetime'])
wind_ss_st = wind_ss.loc[index]
wind_ss_st.reset_index(drop=True, inplace=True)

index = wind_rp['forecasted_datetime'].isin(wind_power_hourmean['datetime'])
wind_rp_st = wind_rp.loc[index]
wind_rp_st.reset_index(drop=True, inplace=True)

train_data['ss'] = wind_ss_st['value']
train_data['rp'] = wind_rp_st['value']
val_data['ss'] = wind_ss_st['value']
val_data['rp'] = wind_rp_st['value']

Model

  • Tabnet
    solar power와 동일하게 변수와 sample의 양이 적음
  • Conv1d
    solar power와 달리 주기성이 부족해 성능이 어느 수준에서 나아지지 않음
  • Tree model
    LGBM를 구성해 optuna로 hyper parameter 최적화

Train

import lightgbm as lgb

train_data.drop(columns=['datetime'],inplace=True)
test_data.drop(columns=['datetime'],inplace=True)

train_x = train_data.values
train_y = wind_power_hourmean['target'].to_numpy()

cond1 = val_data['datetime'] > '2021-04-30 23:00:00'
cond2 = val_data['datetime'] < '2021-09-01 00:00:00'
cond3 = wind_power_hourmean['datetime'] >'2021-04-30 23:00:00'
cond4 = wind_power_hourmean['datetime'] < '2021-09-01 00:00:00'

val_data =  val_data.loc[cond1 & cond2].drop(columns=['datetime'])
val_data.reset_index(drop=True, inplace=True)
val_x = val_data.values
val_y = wind_power_hourmean['target'].loc[cond3 & cond4].to_numpy()

train_dataset = lgb.Dataset(train_x, train_y)
val_dataset = lgb.Dataset(val_x, val_y)
import optuna
def objective(trial):
  param ={
    'objective': 'regression',
    'metric': 'mae',
    'learning_rate': trial.suggest_loguniform('learning_rate', 1e-6,1e-2),
    'max_depth': trial.suggest_int('max_depth',3,15),
    'num_leaves': trial.suggest_int('num_leaves',30,100),
    'min_data_in_leaf': trial.suggest_int('min_data_in_leaf',100,500),
    'max_bin': trial.suggest_int('max_bin',100,255),
    'bagging_fraction':trial.suggest_uniform('bagging_fraction',0.5,0.9),
    'feature_fraction': trial.suggest_uniform('feature_fraction',0.5,0.9),
    'seed': 42
  }

  model_lgb = lgb.train(param, train_set=train_dataset, num_boost_round= 10000, valid_sets=[train_dataset, val_dataset],
                        valid_names=['train','valid'], verbose_eval=500, early_stopping_rounds=100)
  y_pred = model_lgb.predict(val_x)

  return NMAE(y_pred, val_dataset)
## optuna
# 지정된 범위에서 랜덤으로 지목된 hyperparameter로 학습
# n_trial동안 시행하면서 최적화
capacity = 16000
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=30)
study.best_trial.params

9~10% 오차 산출

profile
Cheme + Data science

0개의 댓글