03-04) Modeling-skills

slow_starter·2025년 8월 5일

AI서비스엔지니어 KDT심화 스파르타코딩클럽

스파르타코딩클럽-AI서비스4기

목록 보기

14/44

딥러닝, 머신러닝 모델을 할 때(특히 딥러닝) 중요한 모델링 스킬에 대한
요약 정리

01. Underfitting과 Overfitting

Underfitting : 모델이 너무 단순해서 데이터를 제대로 학습 못함
Overfitting : 훈련 데이터를 너무 잘 학습해서 새로운 데이터에서 성능이 나오지 못하는 경우를 뜻함

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# 1. 데이터 생성(노이즈 포함) 사인 함수
np.random.seed(0)
X = np.linspace(0, 1, 15) # 0과 1 사이 15개 데이터 입력
y_true = np.sin(2 * np.pi * X) # 정답함수 sin(2πx)
y = y_true + np.random.normal(0, 0.2, X.shape) # 노이즈 추가

X = X.reshape(-1, 1)  # (15, 1) 형태로 reshape

# 2. 모델 복잡도를 다르게 해서 3가지 실험
degrees = [1, 3, 12]  # underfit / 적절 / overfit
predictions = []

# 테스트용 데이터 (곡선을 그릴 용도)
X_test = np.linspace(0, 1, 100).reshape(-1, 1)
y_test_true = np.sin(2 * np.pi * X_test)

for d in degrees:
    # 다항 특징 생성
    poly = PolynomialFeatures(degree=d)
    X_poly = poly.fit_transform(X)
    X_test_poly = poly.transform(X_test)

    # 선형 회귀 모델 학습
    model = LinearRegression()
    model.fit(X_poly, y)

    # 예측값 저장
    y_pred = model.predict(X_test_poly)
    predictions.append((d, y_pred, model))

# 3. 시각화
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

for i, (d, y_pred, model) in enumerate(predictions):
    axes[i].scatter(X, y, color='black', label='Training Data')                 # 훈련 데이터
    axes[i].plot(X_test, y_test_true, label='True Function', linestyle='--')   # 실제 함수
    axes[i].plot(X_test, y_pred, label=f'Predicted (degree={d})')              # 모델 예측
    axes[i].set_title(f'Degree {d}\nMSE: {mean_squared_error(y_test_true, y_pred):.3f}')
    axes[i].legend()

plt.tight_layout()
plt.show()