AI/DL/ML (9) - Regularization & Standardization

xsldihsl·2024년 5월 8일

AI DL L1 L2 Lasso ML Regularization Ridge

AI/ML/DL

목록 보기

9/25

Excessive features
Regularization
Standardization

1. Excessive features

더 많은 특성을 만들기 위해 degree = 5 로 변경해보자. 참고로 테스트 세트는 자체적으로 fit method 를 호출 및 변환하는게 아닌, 항상 훈련 세트에서 먼저 학습한 객체를 사용해서 transform 하는 것을 권장한다. 그렇게 해야 consistent 하게 dataset 을 변환 가능하기 때문이다.

poly = PolynomialFeatures(degree=5, include_bias=False)

poly.fit(train_input)
train_poly = poly.transform(train_input)
test_poly = poly.transform(test_input)
print(train_poly.shape) # (42, 55)

그렇다면 train_poly 에 존재하는 특성들이 무려 55 개로 증가하게 된다.

lr.fit(train_poly, train_target)
print(lr.score(train_poly, train_target)) # 0.9999999986076573
print(lr.score(test_poly, test_target)) # -850.1876166821395

각각의 score 를 비교한다면 train_poly 에 대해서는 거의 완벽에 가까운 확률로 정답을 맞추지만 test_poly 에 대한 점수는 음수값으로 떨어지며 매우 큰 차이를 보인다. 즉, 우리의 모델이 overfitted 된 경우이다. 왜 그럴까?

"맞춰야 할 대상보다 사용할 도구가 훨씬 더 많은 것"
우리의 훈련 세트에는 총 42개의 샘플들이 존재한다. 42개의 특성을 각각의 샘플과 매칭시켜도 충분할텐데 특성을 55개씩이나 사용하다보니 과대적합이 발생하게 된다.

2. Regularization

이처럼 극도로 과대적합된 모델을 완화할 수 있는 대표적인 방법으로 "Regularization" 이 있다.

A set of methods for reducing overfitting in machine learning models. Typically, regularization trades a marginal decrease in training accuracy for an increase in generalizability.

선형 회귀는 가중치 (i.e., coefficient) 를 이용하여 각 점들을 지나게 된다. Regularization 은 이런 가중치가 높을수록 그에 상응하는 penalty 를 주어 가중치 값을 줄이도록 한다. 결과적으로 가중치가 줄어든 우리의 모델은 좀 더 일반화되어 부드러운 곡선을 그릴 것이다.

Regularization 은 L1 과 L2 로 나뉠 수 있다. 이 둘 모두 loss function 에 penalty 를 더하는 machine learning methods 인데, 어떤 penalty term 을 사용하는지에 따라 구분된다.

L1 regularization (Lasso) adds absolute value of the coefficient as a penalty term.

L2 regularization (Ridge) adds square value of the coefficient as a penalty term.

L1 은 가중치의 절댓값만큼 penalty 를 더해주는 반면, L2 는 가중치의 제곱을 penalty 로 주게 된다. 특히 linear regression 에서의 L1 을 "Lasso" , L2 를 "Ridge" 라고 부른다. 이 둘에 관한 내용은 다음 글에서 다루도록 하겠다.

3. Standardization

이전에 k-nearest neighbors 를 다루며 하나의 특성이 다른 하나에 끌려가는걸 방지하기 위해 우리는 z-score 를 구하여 표준화를 해주었다. 하지만 Linear regression 에서는 표준화가 불필요하였는데 이는 class 특성상 scale 의 영향을 받지 않고 수치적 방법으로 계산하는 알고리즘으로 구현되었기 때문이다.

그렇다면 "규제"를 적용하게 되어도 여전히 표준화가 필요하지 않을까?
아니다. 규제가 가중치를 기반으로 적용되는 것이므로 데이터의 스케일이 중요하다.

Sci-kit learn 의 transformer 중 하나인 StandardScaler 가 표준화를 쉽게 처리해 줄 수 있다.

from sklearn.preprocessing import StandardScaler

ss = StandardScaler()
ss.fit(train_poly)
train_scaled = ss.transform(train_poly)
test_scaled = ss.transform(test_poly)

A typical advice when using common regularization techniques such as Lasso or Ridge is to standardize the predictors before fitting the model since the penalty is a function of the size of the coefficients, which depends on the scale of the data.

참고 링크: L1 과 L2 regularization 에 대한 설명
https://www.simplilearn.com/tutorials/machine-learning-tutorial/regularization-in-machine-learning

xsldihsl

이전 포스트

AI/ML/DL (8) - Multiple regression & Feature engineering

다음 포스트

AI/DL/ML (9) - Regularization & Standardization

AI/ML/DL

Contents

1. Excessive features

2. Regularization

3. Standardization

AI/ML/DL (8) - Multiple regression & Feature engineering

AI/ML/DL (10) - L1 regularization

0개의 댓글