파이프라인을 활용하여 전처리부터 label 예측까지 간소하게 코딩해보자
import pandas as pd
red_url = 'https://raw.githubusercontent.com/PinkWink/\
ML_tutorial/master/dataset/winequality-red.csv'
white_url = 'https://raw.githubusercontent.com/PinkWink/\
ML_tutorial/master/dataset/winequality-white.csv'
red_wine = pd.read_csv(red_url, sep = ';')
white_wine = pd.read_csv(white_url, sep = ';')
red_wine['color'] =1
white_wine['color']=0
wine = pd.concat([red_wine, white_wine])
X = wine.drop(['color'], axis = 1)
y = wine['color']
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler
estimators = [
('scaler', StandardScaler()),
('clf', DecisionTreeClassifier())
]
pipe = Pipeline(estimators)
pipe.set_params(clf__max_depth = 2)
pipe.set_params(clf__random_state = 13)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,\
random_state = 13, stratify = y)
# 스케일러와 학습을 순차적으로 한큐에 진행
pipe.fit(X_train, y_train)
from sklearn.metrics import accuracy_score
# 그다음 바로 predict 적용 가능
y_pred_tr = pipe.predict(X_train)
y_pred_test = pipe.predict(X_test)
print('Train Acc: ', accuracy_score(y_train, y_pred_tr))
print('Train Acc: ', accuracy_score(y_test, y_pred_test))