교육 정보
- 교육 명: 경기미래기술학교 AI 교육
- 교육 기간: 2023.05.08 ~ 2023.10.31
- 오늘의 커리큘럼:
머신러닝
(7/17 ~ 7/28)
- 강사: 이현주, 이애리 강사님
- 강의 계획:
1. 머신러닝
K-Means, Hierarchical Clustering
1
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
plt.rc('font', family='Malgun Gothic')
plt.rc("axes", unicode_minus=False)
plt.figure(figsize=(5,2))
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
file_path = 'game_usage.csv'
df_ori = pd.read_csv(file_path)
df = df_ori[:]
print(id(df), id(df_ori))
x = df['time spent']
y = df['game level']
plt.figure(figsize=(5,5))
plt.scatter(x, y)
import random
from sklearn.cluster import KMeans
def rand_color(num=1):
col = list()
for _ in range(num):
col.append("#" + "".join([random.choice('0123456789ABCDEF') for _ in range(6)]))
return col
def kmeans_predict_plot(data, k):
model = KMeans(n_clusters = k)
model.fit(data)
labels = model.predict(data)
colors = np.array(rand_color(k))
print(colors)
plt.figure(figsize=(5,5))
plt.title(f'KMC, k = {k}')
print(labels)
plt.scatter(data[:, 0], data[:, 1], color=colors[labels])
game_data = np.column_stack(x, y)
kmeans_predict_plot(game_data, k=2)
kmeans_predict_plot(game_data, k=3)
kmeans_predict_plot(game_data, k=4)
kmeans_predict_plot(game_data, k=4)
plt.xlim(0, 1000)
plt.ylim(0, 1000)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
n_data = scaler.fit_transform(game_data)
print(n_data[:, 0].min(), n_data[:, 0].max())
print(n_data[:, 1].min(), n_data[:, 1].max())
from sklearn.preprocessing import StandardScaler
std_scaler = StandardScaler()
n_data_std_scaled = std_scaler.fit_transform(game_data)
print(n_data_std_scaled[:, 0].std(), n_data_std_scaled[:, 1].std())
print(n_data_std_scaled[:, 0].mean().round(2), n_data_std_scaled[:, 1].mean().round(2))
kmeans_predict_plot(n_data, k=4)
kmeans_predict_plot(n_data_std_scaled, k=5)
import scipy.cluster.hierarchy as sch
fig = plt.figure(figsize=(16,10))
sch.dendrogram(sch.linkage(n_data, method='ward'))
plt.title('Dendrogram')
plt.xlabel('game usage time')
plt.ylabel('distance')
plt.axhline(2, c='k', lw=0.5, ls='--')
plt.show()
from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters=4, affinity='euclidean', linkage='ward')
y_hc = hc.fit_predict(n_data)
plt.figure(figsize=(5,5))
plt.title('Agglomerative')
plt.scatter(n_data[:, 0], n_data[:, 1], c=y_hc, cmap='coolwarm')
plt.show()
잘 읽었습니다. 좋은 정보 감사드립니다.