(ML)Wine 분류기

지며리·2023년 2월 3일
0
post-custom-banner


DecisionTreeClassifier를 활용하여 와인을 분류해보자

1. EDA

import pandas as pd

red_url = 'https://raw.githubusercontent.com/PinkWink/\
					ML_tutorial/master/dataset/winequality-red.csv'
white_url = 'https://raw.githubusercontent.com/PinkWink/\
					ML_tutorial/master/dataset/winequality-white.csv'

red_wine = pd.read_csv(red_url, sep = ';')
white_wine = pd.read_csv(white_url, sep = ';')

red_wine['color'] = 1
white_wine['color'] = 0

wine = pd.concat([red_wine, white_wine])
wine.info()

import matplotlib.pyplot as plt

plt.hist((wine[wine['color']==0]['quality'],\
				wine[wine['color']==1]['quality']), histtype='bar')
plt.show()


2. 레드와인, 화이트와인 분류기

x = wine.drop(['color'], axis = 1)
y = wine['color']

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

# 스케일링
SS = StandardScaler()
SS.fit(x)
X_ss = SS.transform(x)
X_ss_pd = pd.DataFrame(X_ss, columns = x.columns)

X_train, X_test, y_train, y_test = train_test_split(X_ss_pd, y,test_size = 0.2,\
														random_state= 13)

wine_tree = DecisionTreeClassifier(max_depth = 2, random_state = 13)
wine_tree.fit(X_train, y_train)

y_pred_tr = wine_tree.predict(X_train)
y_pred_test = wine_tree.predict(X_test)

print('Train Acc: ', accuracy_score(y_train, y_pred_tr))
print('Train Acc: ', accuracy_score(y_test, y_pred_test))


3. quality 분류기

wine['taste'] = [1 if grade > 5  else 0 for grade in wine['quality']]

# (중요!!) quality까지 drop에 포함시켜야 제대로 된 분류기 역할을 할 수 있다
X= wine.drop(['taste','quality'], axis = 1)
y = wine['taste']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,\
															random_state = 13)

wine_tree = DecisionTreeClassifier(max_depth= 2, random_state = 13)
wine_tree.fit(X_train, y_train)

y_pred_tr = wine_tree.predict(X_train)
y_pred_test = wine_tree.predict(X_test)

print('Train Acc: ', accuracy_score(y_train, y_pred_tr))
print('Train Acc: ', accuracy_score(y_test, y_pred_test))

  • label을 특정 값으로 변환한 후에는 변환 전 label 값을 feature에 포함하지 않도록 주의한다.
profile
쉽고 유익하게 널리널리
post-custom-banner

0개의 댓글