6주차 딥러닝 (3)

정지원·2024년 3월 25일

에이블 스쿨 복습

목록 보기

26/51

성능관리

모델이 복잡해지면 가짜 패턴까지 학습하게 된다.

가짜패턴
- 학습 데이터에만 존재하는 특이한 성질
- 모집단 전체 x
- 과적합의 원인
모델링은 학습용데이터에 있는 패턴으로 모집단 전체를 예측하는 것이 목적이다.
적절하게 예측한 패턴(모델)은 다른 데이터도 잘 예측할 수 있음
=> 적절한 복잡도
적절한 복잡도 를 찾기 위한 방법
- Epoch와 learning_rate 조절
- hidden layer, node 수 조정
- Early Stopping
- Regularization: L1, L2
- Dropout

과적합 및 방지

Early stopping

과적합을 방지하기 위해 모델의 훈련을 조기에 중단하는 방법
반복횟수가 많으면 과적합이 될 수 있다.(항상 과적합은 아님)

from keras.callbacks import EarlyStopping

es = EarlyStopping(monitor = 'val_loss', min_delta = 0, patience=0)

model.fit(x_train, y_train, epochs=100, validataion_split 0.2,
			callbacks=[es])

EarlyStopping옵션
- monitor: 기본값 val_loss
- min_delta: 오차의 최소값에서 줄어드는 양이 몇 이상이 되어야 하는지 지정
- patience: 오차가 줄어들지 않는 것을 몇번 기다려 줄지 지정(기본 0)
- callbacks: epoch 단위로 학습이 진행되는 동안 중간에 개입할 task 지정

실제 사용

# 모델 선언
clear_session()

model2 = Sequential( [Dense(128, input_shape = (nfeatures,), activation= 'relu'),
                      Dense(64, activation= 'relu'),
                      Dense(32, activation= 'relu'),
                      Dense(1, activation= 'sigmoid')] )
model2.compile(optimizer= Adam(learning_rate = 0.001), loss='binary_crossentropy')

# EarlyStopping 설정 ------------
min_de = 0.001                  # binary_crossentropy의 값이 이거보다 커야 인정해줌
pat = 20                        # 얼마남 참을 수 있는지 (5번까지 기다려 줌)

# val_loss
es = EarlyStopping(monitor = 'val_loss', min_delta = min_de, patience = pat)
# --------------------------------

# 학습
hist = model2.fit(x_train, y_train, epochs = 100, validation_split=0.2,
                  callbacks = [es]).history
dl_history_plot(hist)

가중치 규제 Regularization

손실 함수에 추가되는 항으로써 모델의 가중치가 커지지 않도록 하거나 가중치의 분포 조절
모델이 훈련데이터에만 과도하게 적합되지 않고 새로운 데이터도 좋은 성능을 내도록 한다.
- L1규제: 일부 가중치를 0으로 만들어 특성 선택(feature selection)의 효과
  - 0.0001 ~ 0.1
- L2규제: 가중치의 크기를 제한하여 모든 가중치가 작은 값을 갖도록 유도함
  - 0.001 ~ 0.5
은닉층 안에 옵션으로 지정함

# 메모리 정리
clear_session()

# Sequential 타입
model4 = Sequential( [Dense(128, input_shape = (nfeatures,), activation= 'relu',
                            kernel_regularizer = l1(0.01)), # 가중치 규제
                      Dense(64, activation= 'relu', 
                            kernel_regularizer = l1(0.01)), # 가중치 규제
                      Dense(32, activation= 'relu',
                            kernel_regularizer = l1(0.01)), # 가중치 규제
                      Dense(1, activation= 'sigmoid')] )

# 컴파일
model4.compile(optimizer= Adam(learning_rate = 0.001), loss='binary_crossentropy')

모델 저장하기

model.save('파일이름.h5')

모델 로딩하기

from keras.models import load_model
model2 = load_model('파일이름.h5')

중간 체크포인트 저장하기

keras.callback.ModelCheckpoint

각 epoch마다 모델 저장 가능
cp_path: 모델 저장할 경로와 모델 파일 이름
monitor: validation loss(val_loss)를 기준으로 한다
save_best_only=True: 이전 체크포인트의 모델들보다 성능이 개선되면 저장

cp_path = '/content/{epoch:03d}.h5'
mcp = ModelCheckpoint(cp_path, monitor='val_loss', verbose = 1, save_best_only=True)

# 학습
hist = model1.fit(x_train, y_train, epochs = 50, validation_split=.2, callbacks=[mcp]).history

Functional API

Sequential과 다르게 모델을 분리해서 사용 가능하다
다중 입력, 다중 출력 가능
전처리에서도 미리 나누어서 입력 해야한다.

기존 전처리 과정

# 데이터 분할 : x, y
target = 'Sales'
x = data.drop(target, axis=1)
y = data.loc[:, target]

# 가변수화
cat_cols = ['ShelveLoc', 'Education', 'US', 'Urban']
x = pd.get_dummies(x, columns = cat_cols, drop_first = True)

# 데이터 분할 : train, val
x_train, x_val, y_train, y_val = train_test_split(x, y, 
												  test_size=.2,
                                                  random_state = 20)

# 스케일링
scaler = MinMaxScaler()
x_train = scaler.fit_transform(x_train) # 시리즈형태로 저장됨
x_val = scaler.transform(x_val) # 시리즈형태로 저장됨

# 데이터 프레임으로 변환
x_train = pd.DataFrame(x_train, columns=x.columns)
x_val = pd.DataFrame(x_val, columns=x.columns)

입력 나누기

입력1과 입력2
concatenate

# 입력1
in_col = ['변수1','변수2','변수3','변수3','변수4','변수5','변수6'...]
x_train1 = x_train[in_col]
x_val1 = x_val[in_col]

# 입력2
x_train2 = x_train.drop(in_col, axis = 1)
x_val2 = x_val.drop(in_col, axis = 1)

다중 입력 모델링

nfeatures1 = x_train1.shape[1]
nfeatures2 = x_train2.shape[1]

# 입력
input_1 = Input(shape=(nfeatures1, ), name= 'input_1')
input_2 = Input(shape=(nfeatures2, ), name= 'input_2')

# 첫 번째 입력을 위한 레이어
hl_1 = Dense(10, activation='relu')(input_1)
hl_2 = Dense(20, activation='relu')(input_2)

# 두 히든레이어 결합
cbl = concatenate([hl_1, hl_2])

# 추가 히든레이어
hl2 = Dense(8, activation='relu')(cbl)


# 출력 레이어
output = Dense(1)(hl2)

# 모델 선언
model = Model(inputs = [input_1, input_2], outputs= output)

model.summary()

# 컴파일
model.compile(optimizer=Adam(learning_rate = 0.01), loss = 'mse')

# 학습 (다중입력)
hist = model.fit([x_train1, x_train2], y_train, epochs=50, validation_split=.2).history

# 예측 (다중입력)
pred = model.predict([x_val1, x_val2])
print(mean_squared_error(y_val, pred, squared = False))
print(mean_absolute_error(y_val, pred))