파이프라인을 사용하면서 그리드 서치를 적용하기

Joo·2024년 5월 27일

MLDL 101

목록 보기

6/17

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

estimators = [('scaler', StandardScaler()), # 데이터 전처리(scaling)
              ('clf', DecisionTreeClassifier(random_state=13))] # 모델 생성

pipe = Pipeline(estimators) # 파이프라인 객체 생성

# pipeline을 활용하며 params 하이퍼파라미터들 목록 설정 ('clf__'만 추가된 것)
param_grid = [ {'clf__max_depth': [2, 4, 7, 10]}] 

GridSearch = GridSearchCV(estimator=pipe, param_grid=param_grid, cv=5)
GridSearch.fit(X, y)

print(GridSearch.best_score_)
print(GridSearch.best_params_)

결과)

0.6888004974240539
{'clf__max_depth': 2}

트리 시각화

from graphviz import Source
from sklearn.tree import export_graphviz

# # 기본 방식 트리 시각화
# Source(export_graphviz(wine_tree, feature_names=X_train.columns, # estimator = 트리 인스턴스 자체
#                        class_names=['W', 'R'],
#                        rounded=True, filled=True))

# # 파이프라인 사용한 트리 시각화
# Source(export_graphviz(pipe['clf'], feature_names=X.columns, # estimator = 파이프라인에서 할당한 모델(clf)
#                        class_names=['W', 'R'],
#                        rounded=True, filled=True))

# 파이프라인 + 그리드서치 사용한 트리 시각화
# estimator = 그리드서치에서 찾은 최적의 모델에서 최종의 모델로 지정된 트리 인스턴스)
Source(export_graphviz(GridSearch.best_estimator_['clf'], feature_names=X.columns,
                       class_names=['W', 'R'],    
                       rounded=True, filled=True))

테이블로 하이퍼 파라미터 조합별 모델 성능 확인

score_df = pd.DataFrame(GridSearch.cv_results_)
score_df[['params', 'rank_test_score', 'mean_test_score', 'std_test_score']]

결과)

params	rank_test_score	mean_test_score	std_test_score
0	{'clf__max_depth': 2}	1	0.688800	0.071799
1	{'clf__max_depth': 4}	2	0.663565	0.083905
2	{'clf__max_depth': 7}	3	0.653408	0.086993
3	{'clf__max_depth': 10}	4	0.644016	0.076915

Joo

적당히 공부한 거 정리하는 곳

이전 포스트

교차 검증과 하이퍼 파라미터 튜닝에 있어서의 내 오해

다음 포스트

파이프라인을 사용하면서 그리드 서치를 적용하기

MLDL 101

교차 검증과 하이퍼 파라미터 튜닝에 있어서의 내 오해

SVM은 국경 나누기

0개의 댓글

관련 채용 정보