Machine Learning - KNN Practice - Iris Flower Analysis ๐ŸŒธ

ํ™”์ดํ‹ฐ ยท2023๋…„ 12์›” 18์ผ
0

Machine Learning

๋ชฉ๋ก ๋ณด๊ธฐ
8/23

IrisFlower ํ•™์Šต

๋ชฉํ‘œ

  • ๋ถ“๊ฝƒ์˜ ๊ฝƒ์žŽ ๊ฐˆ์ด, ๊ฝƒ์žŽ ๋„ˆ๋น„, ๊ฝƒ๋ฐ›์นจ ๊ธธ์ด, ๊ฝƒ๋ฐ›์นจ ๋จธ๋น„ ํŠน์ง•์„ ํ™œ์šฉํ•ด 3๊ฐ€์ง€ ํ’ˆ์ข…์„ ๋ถ„๋ฅ˜ํ•ด๋ณด์ž

  • KNN๋ชจ๋ธ์˜ ์ด์›ƒ์˜ ์ˆซ์ž๋ฅผ ์กฐ์ ˆํ•ด๋ณด์ž

  • petal: ๊ฝƒ์žŽ/sepal: ๊ฝƒ๋ฐ›์นจ ๐ŸŒธ๐ŸŒบ

#์‚ฌ์ดํ‚ท๋Ÿฐ์—์„œ ์ œ๊ณตํ•˜๋Š” ๋ฐ์ดํ„ฐ ์„ธํŠธ ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋“ˆ
from sklearn.datasets import load_iris
iris_data = load_iris()
iris_data #Bunch ํด๋ž˜์Šค ๊ฐ์ฒด(dictionary์™€ ์œ ์‚ฌํ•จ)
# iris_dataset์˜ ํ‚ค ๊ฐ’
iris_data.keys()
# DESCRํ‚ค์—๋Š” ๋ฐ์ดํƒ€์…‹์— ๋Œ€ํ•œ ์„ค๋ช…
print(iris_data['DESCR'])
# target_names: ์ •๋‹ต/์˜ˆ์ธกํ•œ๊ณ ์ž ํ•˜๋Š” ๋ถ“๊ฝƒ ํ’ˆ์ข…์˜ ์ด๋ฆ„์„ ๋ฌธ์ž์—ด ๋ฐฐ์—ด๋กœ ๊ฐ€์ž๊ณ ๋ฐ์ดํ„ฐ
print(iris_data['target_names'])
#feature_names: ๊ฐ ํŠน์„ฑ์„ ์„ค๋ช…ํ•˜๋Š” ๋ฌธ์ž์—ด
iris_data['feature_names']
# ๊ฝƒ์žŽ์˜ ๊ธธ์ด, ํ‘น, ๊ฝƒ๋ฐ›์นจ์˜ ๊ธธ์ด, ๋ฌต->2์ฐจ์›numpy๋ฐฐ์—ด์˜ ํ˜•ํƒœ
iris_data['data']
# target๊ฐ’ 1์ฐจ์› numpy๋ฐฐ์—ด
#0์€ setoda 1์€ versocolor , 2๋Š” virginica
iris_data['target']

datasets ๊ตฌ์„ฑํ•˜๊ธฐ

  • ๋ฌธ์ œ์™€ ๋‹ต ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌ
  • ํ›ˆ๋ จ์„ธํŠธ์™€ ํ‰๊ฐ€ ์„ธํŠธ๋กœ ๋ถ„๋ฆฌ
    • ํ›ˆ๋ จ์„ธํŠธ/๋ฐ์ดํ„ฐ: ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ํ•™์Šตํ•  ๋•Œ ์‚ฌ์šฉ
    • ํ‰๊ฐ€์„ธํŠธ/๋ฐ์ดํ„ฐ: ๋ชจ๋ธ์ด ์–ผ๋งˆ๋‚˜ ์ž˜ ์ž‘๋™ํ•˜๋Š”์ง€ ํŠน์ •ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ
# library import
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split #๋ฐ์ดํ„ฐmixed up
# ์‚ฌ์ดํ‚ฅ๋Ÿฐ library ๊ฐ€์ ธ์˜ค๊ธฐ
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
plt.rcParams['font.family']='Malgun Gothic'
#๋ฌธ์ž๋ฐ์ดํ„ฐ 2์ฐจ์› ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ƒ์„ฑ
#columns name ์„ค์žฅ
iris_df = pd.DataFrame(iris_data['data'],columns = ['sepal length (cm)','sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)'])
iris_df
#๋ฌธ์ œ์™€ ๋‹ต๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌ
x = iris_df.values
y = iris_data['target']
#ํ›ˆ๋ จ์„ธํŠธ์™€ ํ‰๊ฐ€์„ธํŠธ๋กœ ๋ถ„๋ฆฌ
#train_test_split:๋ฐ์ดํ„ฐ๋ฅผ ๋‚˜๋ˆ„๊ธฐ ์ „์— ์œ ์‚ฌ ๋‚œ์ˆ˜ ์ƒ์„ฑ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•ด
#๋ฐ์ดํ„ฐ์…‹์„ ๋ฌด์ž‘์œ„๋กœ ์„ž์–ด์ค€๋‹ค train: 70% test 30%
x_train,x_test,y_train,y_test=train_test_split(x,y,
                       test_size = 0.3,random_state = 65)
# random_state ๋ฉ”๊ฐœ๋ณ€์ˆ˜: ํ•จ์ˆ˜๋ฅผ ์‹คํ–‰ํ•ด๋„ ๊ฒฐ๊ณผ๊ฐ€ ๋˜‘๊ฐ™์ด ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค
#ํ›ˆ๋ จ์šฉ ์„ธํŠธ ํฌ๊ธฐ ํ™•์ธ
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)
#model์ƒ์„ฑ
knn = KNeighborsClassifier(n_neighbors = 3)
#ํ›ˆ๋ จ/ํ•™์Šต
# ๋ชจ๋ธ๋ช….fit(ํ›ˆ๋ จ์šฉ ๋ฌธ์ œ,ํ›ˆ๋ จ์šฉ ๋‹ต)
knn.fit(x_train,y_train)
#์˜ˆ์ธก
#๋ชจ๋ธ๋ช….predict(ํ…Œ์ŠคํŠธ์šฉ ๋ฌธ์ œ)
pre =knn.predict(x_test)
pre
#ํ‰๊ฐ€ํ•˜๊ธฐ
metrics.accuracy_score(pre,y_test)
profile
์—ด์‹ฌํžˆ ๊ณต๋ถ€ํ•ฉ์‹œ๋‹ค! The best is yet to come! ๐Ÿ’œ

0๊ฐœ์˜ ๋Œ“๊ธ€