scikit-learn의 model_select 모듈은 데이터 세트 분리, 교차 검증, 하이퍼 파라미터 튜닝과 관련된 함수들과 클래스를 제공한다. 아래와 같이 정리하였다.
estimator object.
This is assumed to implement the scikit-learn estimator interface. Either estimator needs to provide a
score
function, orscoring
must be passed.
dict or list of dictionaries
Dictionary with parameters names (
str
) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings.
__
를 사용하여 estimator의 변수에 접근할 수 있다.from sklearn.model_selection import KFold, GridSearchCV
pipe = Pipeline([('preprocessing', None), ('regressor', None)])
pre_list = [StandardScaler(), MinMaxScaler(), None]
hyperparam_grid = [
# classification
# LogisticRegression
{'regressor': [LogisticRegression()], 'preprocessing': pre_list,
'regressor__C': [0.0001, 0.001, 0.01, 0.1, 1, 10]},
# DecisionTree
{'regressor': [DecisionTreeClassifier()], 'preprocessing': pre_list,
'regressor__max_depth': [3, 5, 7, 11], 'regressor__min_samples_split': [2, 3, 5],
'regressor__min_samples_leaf': [1, 5, 8]},
# RandomForest
{'regressor': [RandomForestClassifier()], 'preprocessing': pre_list,
'regressor__max_depth': [5, 6, 7, 8, 9], 'regressor__min_samples_split': [3, 4, 5],
'regressor__min_samples_leaf': [1, 2]},
# Support Vector Classifier
{'regressor': [SVC()], 'preprocessing': pre_list,
'regressor__C': [0.1, 1, 3, 5, 10], 'regressor__kernel': ['poly', 'rbf', 'sigmoid'],
'regressor__gamma': ['scale', 'auto']},
# Gradient Boosting Classifer
{'regressor': [GradientBoostingClassifier()], 'preprocessing': pre_list,
'regressor__learning_rate': [0.001, 0.01, 0.1, 1, 3, 5],
'regressor__n_estimators': [30, 50, 100, 200]},
# Gaussian Naive Bayes
{'regressor': [GaussianNB()], 'preprocessing': pre_list}
kfold = KFold(n_splits=7, shuffle=True, random_state=1)
]
grid = GridSearchCV(pipe, hyperparam_grid, scoring='accuracy', refit=True, cv=kfold)
grid.fit(train_x, train_y)
estimator = grid.best_estimator_
result_list = estimator.predict(test_x)