머신러닝 순서

시리·2023년 9월 11일

목록 보기

1/2

1. 환경 준비

# 라이브러리 불러오기
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings(action='ignore')
%config InlineBackend.figure_format = 'retina'

# 데이터 읽어오기
path = '데이터 위치'
data = pd.read_csv(path)

2. 데이터의 이해

# 기초 정보 확인
data.head()
data.tail()
data.info()
data.describe()

# 상관관계 분석 및 히트맵 시각화
data.corr()

sns.heatmap(data.corr(),
            annot=True, 
            cmap='Blues', 
            fmt='.2f', 
            cbar=False, 
            square=True, 
            annot_kws={'size' : 8})
plt.show()

# 가장 상관관계가 높은 변수들 산점도로 시각화
plt.scatter(data['xcol'], data['ycol'])
plt.show()

3. 데이터 준비

# 결측치 처리
data.isna().sum()

# 결측치 채우기
data.fillna(method='ffill', 'bfill', inplace=True)
data.isna().sum()

# 변수 제거
drop_cols = ['col', 'col']
data.drop(drop_cols, axis=1, inplace=True)

# x, y 분리
target = 'Methane'
x = data.drop(target, axis=1)
y = data.loc[:, target]

# 학습용, 평가용 데이터 분리
from sklearn.model_selection(x, y, test_size=0.3, random_state=1)

# 특별한 이유가 없다면 7:3으로 분리
x_train, x_test, y_train, y_test = tran_test_split(x, y, test_size=0.3, text_size= 'r'

4. 모델링

1단계 : 불러오기

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error

2단계 : 선언하기

model = LinearRegression()

3단계 : 학습하기

model.fit(x_train, y_train)

4단계 : 예측하기

y_pred = mode.predict(x_test)

# 실젯값 10개만 확인하기
print(y_test.values[:10])
# 예측값
print(y_pred[:10]

5단계 : 평가하기(평균오차)

print('MAE', mean_absolute_error(y_test, y_pred)

# 시각화 비교
y_mean = y-test.values.mean()
plt.plot(y_pred, label='Predicted')
plt.plot(y_test.values, label='Actually')
plt.axhline(y_mean, color='r'
plt.legend()
plt.show()

시리

데이터 분석가 되기 프로젝트 ٩( ᐛ )و

다음 포스트

머신러닝 순서

머신러닝

1. 환경 준비

2. 데이터의 이해

3. 데이터 준비

4. 모델링

1단계 : 불러오기

2단계 : 선언하기

3단계 : 학습하기

4단계 : 예측하기

5단계 : 평가하기(평균오차)

머신러닝 라이브러리 불러오기(회귀모델, 분류모델, 성능튜닝)

0개의 댓글