머신러닝

단비·2025년 4월 7일

머신러닝

인공지능의 한 분야이며 데이터를 이용하여 스스로 학습하는 알고리즘을 개발하는 기술이다. 다양한 분야에 사용되고 있으며 업무 효율성 향상과 고객 맞춤형 서비스 제공 등 많은 경험을 제공해준다.

머신러닝의 3요소

데이터, 학습, 모델

데이터 - 잘 정리된 양질의 데이터를 사용해야함!
학습 - 데이터에서 패턴을 찾는 과정
모델 - 데이터와 학습을 연결하는 요소, 입력값을 분석하고 출력을 생성

머신러닝의 프로세스

데이터수집 -> 데이터 탐색 및 전처리 -> 데이터분할 -> 모델 선택 및 학습 -> 예측 및 평가

데이터분할

전체 데이터를 용도별로 나누는 과정
train data를 통해 학습하고 test data를 통해 예측

데이터프레임 생성


import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

iris= load.iris()
iris_data = iris.data  #X(독립변수) - Y를 결정하는데 영향을 미치는 요소들

iris_label = iris.target    #Y(종속변수) - 붓꽃의 종류

iris_df = pd.Datafram(data=iris_data, columns=iris.feture_names)
iris_df['label'] = iris.target

데이터 분할


x = iris_df.drop(['label'], axis=1) 
y = iris_df['label']   
print(x.shape)  #(150,4) 왜 4열인가? -> Y를 결정하는 컬럼이 4개니까!
print(y.shape)  #(150,)

"""
보통 7:3, 8:2 비율로 학습용데이터, 평가용 데이터로 나눈다.
random_state는 자유롭게 설정하면 된다.

"""


x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=777)  
#원본과 데이터의 비율이 너무 안맞는 경우는 stratify = y로 설정해줌 (사실 잘 사용하지 않는다.)



print(x_train.shape)   #(105,4) 
print(x_test.shape)    #(45, 4)  
print(y_train.shape)    #(105,)
print(y_test.shape)     #(45,)
print(y_train.value_counts())  
print(y_test.value_counts())

모델 선택 및 학습

의사결정나무 -> 스무고개하는 느낌으로 데이터에 대한 질문을 통해 학습시킨다.


from sklearn.tree import DecisionTreeClassifier


model = DecisionTreeClassifier() #모델 객체화


model.fit(x_train, y_train)   #x_train, y_train을 매개변수로 전달

"""
학습 시키는 코드는 매우 간단한 편이다!
"""

예측 및 평가

test data를 예측하고 정확도를 통해 성능을 측정한다.

예측 수행

#예측의 메서드 predict - 시험보려면 뭐가필요하지? 바로 x_test! 우리가 가진 정답은 y_test!

from sklearn.metrics import accuracy_score
pred = model.predict(x_test)

정확도로 성능 측정

#x_test와 정답인 y_test를 비교하는 작업

from sklearn.metrics import accuracy_score

score = accuracy_score(pred, y_test)

print(score) #0.933

#random_state설정마다 결과값이 다르게 나오고 test_size마다 다르게 나옴

단비

다음 포스트

머신러닝

머신러닝

머신러닝의 3요소

머신러닝의 프로세스

데이터분할

모델 선택 및 학습

예측 및 평가

머신러닝 2

0개의 댓글