Decision Tree _ Example ) adult

김지윤·2023년 8월 7일
0

Scikit-learn

목록 보기
9/11
post-thumbnail
import pandas as pd
import numpy as np
import seaborn as sns

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.naive_bayes import GaussianNB, BernoulliNB
from sklearn.linear_model import LogisticRegression, SGDClassifier

from sklearn.metrics import r2_score, mean_squared_error
from sklearn.metrics import RocCurveDisplay, roc_auc_score

from sklearn.model_selection import KFold

from sklearn.neighbors import KNeighborsClassifier
data = pd.read_csv('C:/Users/ddi05/Class_05/adult/adult.data', header = None)

data.head()

Listing of attributes:

  • <=50K, >50K (14) : target
  • age (0) : continuous.
  • workclass (1) : Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.
  • fnlwgt (2) : continuous.
  • education (3) : Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.
  • education-num (4) : continuous.
  • marital-status (5) : Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
  • occupation (6) : Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.
  • relationship (7) : Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.
  • race (8) : White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
  • sex (9) : Female, Male.
  • capital-gain (10) : continuous.
  • capital-loss (11) : continuous.
  • hours-per-week (12) : continuous.
  • native-country (13) : United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.

1. Label Encoding

encoder  = LabelEncoder()
labels = [1,3,5,6,7,8,9,13,14]

for i in labels :
	data[i] = encoder.fit_transform(data[i])
 
data

2. MinMaxScaling

scaler = MinMaxScaler()
labels = [[0,4]]

for i in labels :
  data[i] = scaler.fit_transform(data[i])
  
data

3. model fit _ DecisionTreeClassifier

y = data[14]
X = data[[0, 1, 3, 4, 7, 8, 9]]

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state = 10)

dt = DecisionTreeClassifier(max_depth = 3)
dt.fit(X_train, y_train)
pred1 = dt.predict(X_test)

print('train score ', dt.score(X_train, y_train))
print('test score', dt.score(X_test, y_test))

plot_tree(dt)

4. RocCurve

RocCurveDisplay.from_predictions(y_test, pred1)

5. prediction

pd.DataFrame({'index' : X_test.index, 'pred' : pred1})

업로드중..

6. 여러 예측모델 적합 : DecisionTree, Logistic_reg, GausianNB

models = [DecisionTreeClassifier(max_depth=3), LogisticRegression(solver='liblinear'), GaussianNB()]
model_names = ['DecisionTree', 'logistic_reg', 'Naive Bayes']

for model, model_name in zip(models, model_names) :
    m = model
    m.fit(X_train, y_train)
    pred = m.predict(X_test)
    
    print('--------------', model_name, '---------------')
    print('Train score', m.score(X_train, y_train))
    print('Test score', m.score(X_test, y_test))

업로드중..

profile
데이터 분석 / 데이터 사이언티스트 / AI 딥러닝

0개의 댓글