AIC 기반 소거법

이정훈·2026년 4월 11일

1. AIC (Akaike Information Criterion)

모델의 복잡도와 데이터에 대한 적합도를 함꼐 고려하는 지표
- $k$ : 모델의 파라미터(변수) 개수 → 복잡도에 대한 페널티
- $L$ : 모델의 최대 우도(Maximum Likelihood) → 데이터 설명력 (적합도) $AIC = 2k - 2\ln(L)$
Regression이면 MLE는 RMSE로 정의 → 결국 AIC는 RMSE값에 비례해서 증감
즉, AIC값이 작을수록 좋은 모델

2. How to Variable Elimination

Backward Elimination (후진 소거법)

모든 변수를 다 넣고 full model을 만든 뒤, 가장 AIC가 낮아지는 변수부터 하나씩 쳐냄
전체 변수들이 함께 있을 때 발생하는 '상호작용'을 극대화시킬 수 있음
상대적으로 느림

Code

def backward_elimination(df, target_col, feature_cols):
   X = sm.add_constant(df[current_cols])
   best_aic = sm.OLS(df[target_col], X).fit().aic
   
   while True:
       worst_feature = None
       best_new_aic = best_aic
       
       for feature in current_cols:
           test_cols = [c for c in current_cols if c != feature]
           X = sm.add_constant(df[test_cols])
           model = sm.OLS(df[target_col], X).fit()
           
           if model.aic < best_new_aic:
           	  best_new_aic = model.aic
              worst_feature = feature
            
        if worst_feature is not None:
        	current_cols.remove(worst_feature)
            est_aic = beat_new_aic
		else:
            break
    return current_cols, best_aic

Forward Selection (전진 선택법)

변수가 하나도 없는 empty model에서 시작해서 AIC가 높은 변수 부터 선택
초기 연산 속도가 빠름
한 번 들어간 변수는 다시 뺄 수 없음

import statsmodels.api as sm

def forward_selection(df, target_col, feature_cols):
    selected_cols = []
    remaining_cols = list(feature_cols)  # 아직 선택 안 된 변수 pool

    # 초기 AIC: 절편만 있는 모델
    X = sm.add_constant(pd.Series([1] * len(df)))
    best_aic = sm.OLS(df[target_col], X).fit().aic

    while remaining_cols:
        best_feature = None
        best_new_aic = best_aic  # ← 0이 아닌 현재 AIC 기준

        for feature in remaining_cols:           # ← 선택 안 된 변수 순회
            X = sm.add_constant(df[selected_cols + [feature]])  # ← 리스트로 감싸기
            model = sm.OLS(df[target_col], X).fit()  # ← target_col (오타 수정)

            if model.aic < best_new_aic:
                best_new_aic = model.aic
                best_feature = feature           # ← 최적 변수 기록

        if best_feature is not None:             # backward의 worst_feature와 동일한 패턴
            selected_cols.append(best_feature)
            remaining_cols.remove(best_feature)  # ← pool에서 제거
            best_aic = best_new_aic
        else:
            break                                # 개선 없으면 종료

    return selected_cols

Stepwise Selection (단계적 선택법)

Backward Elimination와 Forward Selection를 결합한 하이브리드 방식
변수를 하나 새로 추가할 때마다(Forward) 기존에 들어있던 변수들을 다시 평가하여 제거(Backward)

import statsmodels.api as sm

이정훈

AngDDo

이전 포스트

R² / Adjusted R²

다음 포스트

AIC 기반 소거법

1. AIC (Akaike Information Criterion)

2. How to Variable Elimination

Backward Elimination (후진 소거법)

Code

Forward Selection (전진 선택법)

Stepwise Selection (단계적 선택법)

R² / Adjusted R²

Recall(binary / macro / micro / samples / weighted)

0개의 댓글