ML 활용한 Feature Selection

JJong·2025년 4월 15일

feature selection machine learning

MACHINE LEARNING

목록 보기

8/10

Machine Learning을 활용한 Feature Selection

주요 Machine Learning 관련 기법

Decision Tree-based Feature Importance
Random Forest Importance
Gradient Boosted Trees Importance
Recursive Feature Elimination with Cross-Validation

Python Library

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier

1. Decision Tree based

Decision Tree를 구성할 때, 어떤 변수가 node 분할에 대한 Feature Importance(특징 중요도)를 평가

구체적인 방법

Decision Tree Learning

각 node에서 feature 분할로 인한 불순도 감소를 합산
ex) 타이타닉 데이터셋에서 생존 예측 모델을 학습시킬 때, '성별'이나 '객실 등급'과 같은 변수가 높은 중요도를 갖을 수 있다.

2. Random Forest based

Random Forest 모델 내의 여러 Decision Tree 툴의 feature importance를 평균내어 전체 feature importnace를 평가하는 방법

구체적인 방법

Random Forest learning

각 Tree의 importance를 합산하여 평균 계산
ex) 주택 가격 예측 모델에서 '면적', '위치', '방의 수' 등이 중요한 특성으로 평가될 수 있다.

3. Gradient Boosted based

GBT 알고리즘에서 각 Tree의 Feature Importance를 합산하여 전체 Importance를 계산하는 방법

구체적인 방법

GBT learning

각 Tree에서 Feature Importance 합산
ex) 고객 이탈 예측 모델에서 '최근 구매일', '총 구매 금액', '사용 플랫폼'등이 중요한 특성으로 간주 될 수 있음.

4. Feature Selection Using Regularization (L1/L2)

정규화 (L1/L2)를 포함한 regression. L1 정규화는 feature coefficient(특징 계수)를 0으로도 만들 수 있음

구체적인 방법

L1/L2 Regularization을 포함한 regression model을 학습

Regularication coefficient가 각 feature에 적용됨
ex) multivariate regression 문제에서 L1 regularization을 사용하면 일부 변수의 계수가 0이 되어 해당 변수들을 제외하고 모델을 학습할 수 있다.

5. Recursive Feature Elimination with Cross-Validation - RFE

모델의 Importance를 기반으로 특성을 반복적으로 제거하는 동시에, 교차 검증을 사용하여 모델의 성능을 평가하는 방법.

구체적인 방법

모든 Feature로 모델을 학습

Feature Importance 순서대로 특성을 제거

Cross Validation을 사용하여 모델 성능을 평가

최적의 Feature수를 선정
ex) 스팸 메일 분류에서 RFE를 사용하면, 수백 개의 단어 특성 중에서 스팸 예측에 가장 중요한 단어들만 선택하여 모델을 학습시킬 수 있다.

JJong

please bbbbbbbbb 😂

이전 포스트

Advanced CNN Model_2

다음 포스트

ML 활용한 Feature Selection

MACHINE LEARNING

Machine Learning을 활용한 Feature Selection

주요 Machine Learning 관련 기법

Python Library

1. Decision Tree based

2. Random Forest based

3. Gradient Boosted based

4. Feature Selection Using Regularization (L1/L2)

5. Recursive Feature Elimination with Cross-Validation - RFE

Advanced CNN Model_2

Feature Reduction

0개의 댓글