[ML] 특성 중요도 Permutation feature importance

youngchae·2022년 8월 28일

sklearn 머신러닝

ML

목록 보기

2/5

Permutation feature importance

: 모델에 가장 영향을 많이 미치는 feature를 선택하는 방법

feature 그 자체로 내재적 예측 값을 반영하는 것이 아니라 이 feature가 특정 모델에 얼마나 중요한지를 반영
특정 feature들이 값을 완전히 변조했을 때(원래 있는 feature를 조작) 모델 성능이 얼마나 저하되는지를 기준으로 feature 중요도 산정 → 많이 떨어지는 feature의 중요도 높다고 판단
검증 데이터에 특정 피처들을 반복적으로 변조한 뒤 해당 feature의 중요도를 평균적으로 산정
feature importance를 구하는 것보다 시간이 오래 걸림

Permutation Importance의 장점

모델을 재학습 시킬 필요가 없음
정확도 면에서 상대적으로 높은 타당성을 가짐

Permutation Importance 예제

import eli5
from eli5.sklearn import PermutationImportance

perm = PermutationImportance(lgbm_model, scoring = "neg_mean_squared_error", random_state = 42).fit(x_val, y_val)
eli5.show_weights(perm, top = 80, feature_names = x_val.columns.tolist())

RFECV with Permutation Importance

RFECV와 Permuation Importance를 연결하여 사용 가능

from eli5 import show_weights
from eli5.sklearn import PermutationImportance
from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn.model_selection import KFold
from sklearn.svm import SVR

X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)

estimator = SVR(kernel="linear")
selector = RFECV(
    PermutationImportance(estimator,  scoring='neg_mean_squared_error', n_iter=10, random_state=42, cv=KFold(n_splits=3)),
    cv=KFold(n_splits=3),
    scoring='neg_mean_squared_error',
    step=1
)
selector = selector.fit(X, y)
selector.ranking_

show_weights(selector.estimator_)

feature importance가 feature selection의 기준이 될 수 없는 이유

feature importance는 트리 구조를 만들기 위한 불순도(impurity)가 중요 기준(트리를 분할하는 데 얼마나 기여 하였는지) → 구하고자 하는 값과 관련이 없어도 feature importance는 높아질 수 있음
feature importance의 number형의 높은 cardinality feature에 편향되어있음

[참고]

youngchae

we_need_to_talk_about_ds

이전 포스트

[ML] 재귀적 특성 제거 RFE

다음 포스트

[ML] 특성 중요도 Permutation feature importance

ML

Permutation feature importance

Permutation Importance의 장점

Permutation Importance 예제

RFECV with Permutation Importance

feature importance가 feature selection의 기준이 될 수 없는 이유

[ML] 재귀적 특성 제거 RFE

[ML] 머신러닝 평가지표 - 회귀, 분류, 클러스터링

0개의 댓글

관련 채용 정보