작업형2
[가격 예측] 중고 자동차
- 예측할 값(y): price
- 평가: RMSE (Root Mean Squared Error)
- data: train.csv, test.csv
- 제출 형식: result.csv파일을 아래와 같은 형식(수치형)으로 제출
pred
11000
20500
19610
...
11995
답안 제출 참고
- pd.read_csv('result.csv') 로 제출 코드 확인
import pandas as pd
train = pd.read_csv("")
test = pd.read_csv("")
train.shape, test.shape
display(train.head(3))
display(test.head(3))
train.info()
train.describe()
test.describe()
train.describe(include='O')
test.describe(include='O')
test['transmission'].value_counts()
train['price'].hist()
display(train.isnull().sum())
display(test.isnull().sum())
y_train = train.pop("price")
cols = ['year', 'mileage', 'tax', 'mpg', 'engineSize']
train = train[cols]
test = test[cols]
from sklearn.model_selection import train_test_split
X_tr, X_val, y_tr, y_val = train_test_split(train, y_train, test_size=0.2, random_state=2022)
X_tr.shape, X_val.shape, y_tr.shape, y_val.shape
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor()
rf.fit(X_tr, y_tr)
pred = rf.predict(X_val)
from sklearn.metrics import mean_squared_error
def rmse(y_true, y_pred):
return mean_squared_error(y_true, y_pred)**0.5
rmse(y_val, pred)
pred = rf.predict(test)
result = pd.DataFrame({
'pred':pred
})
result.to_csv("result.csv", index=False)
pd.read_csv('result.csv')