의사결정 규칙을 트리 구조로 나타내여 전체 자료를 몇 개의 소집단으로 분류하여 예측을 수행하는 분석 방법

sklearn.tree.DecisionTreeClassifiersklearn.tree.DecisionTreeRegressor
X_features = ['Pclass','Sex','Age']
# Pclass: LabelEncoder
# Sex: LabelEncoder
# Age: 결측치-> 평균으로 대치하고
le = LabelEncoder()
titaninc_df['Sex'] = le.fit_transform(titaninc_df['Sex'])
le2 = LabelEncoder()
titaninc_df['Pclass'] = le2.fit_transform(titaninc_df['Pclass'])
age_mean = titaninc_df['Age'].mean()
titaninc_df['Age'] = titaninc_df['Age'].fillna(age_mean)
X = titaninc_df[X_features]
y = titaninc_df['Survived']
model_dt = DecisionTreeClassifier()
model_dt.fit(X,y)
plt.figure(figsize = (10,5))
plot_tree(model_dt, feature_names=X_features, class_names=['Not Survived','Survived'], filled= True)
plt.show()

model_dt = DecisionTreeClassifier(max_depth = 5)
model_dt.fit(X,y)
plt.figure(figsize = (10,5))
plot_tree(model_dt, feature_names=X_features, class_names=['Not Survived','Survived'], filled= True)
plt.show()
