Kaggle Challenge 09 - Your First Machine Learning Model

JongseokLee·2021년 8월 11일
0
post-thumbnail

Kaggle Challenge 09 - Your First Machine Learning Model

Tutorial01

Import

import pandas as pd

melbourne_file_path = '../input/melbourne-housing-snapshot/melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path) 
melbourne_data.columns
Index(['Suburb', 'Address', 'Rooms', 'Type', 'Price', 'Method', 'SellerG',
       'Date', 'Distance', 'Postcode', 'Bedroom2', 'Bathroom', 'Car',
       'Landsize', 'BuildingArea', 'YearBuilt', 'CouncilArea', 'Lattitude',
       'Longtitude', 'Regionname', 'Propertycount'],
      dtype='object')

Step 1: Specify Prediction Target

Quesition

print the list of columns in the dataset to find the name of the prediction target

y = ____

# Check your answer
step_1.check()

Solution

y = home_data.SalePrice

Step 2: Create X

Quesition

Step 2: Create X
Now you will create a DataFrame called X holding the predictive features.
Since you want only some columns from the original data, you'll first create a list with the names of the columns you want in X.
You'll use just the following columns in the list (you can copy and paste the whole list to save some typing, though you'll still need to add quotes): LotArea YearBuilt 1stFlrSF 2ndFlrSF FullBath BedroomAbvGr * TotRmsAbvGrd
After you've created that list of features, use it to create the DataFrame that you'll use to fit the model.

# Create the list of features below
feature_names = ___

# Select data corresponding to features in feature_names
X = ____

Solution

feature_names = ["LotArea", "YearBuilt", "1stFlrSF", "2ndFlrSF",
                      "FullBath", "BedroomAbvGr", "TotRmsAbvGrd"]

X=home_data[feature_names]

Step 3: Specify and Fit Mode

Quesition

Create a DecisionTreeRegressor and save it iowa_model. Ensure you've done the relevant import from sklearn to run this command.
Then fit the model you just created using the data in X and y that you saved above.

# from _ import _
#specify the model. 
#For model reproducibility, set a numeric value for random_state when specifying the model
iowa_model = ____

# Fit the model

Solution

from sklearn.tree import DecisionTreeRegressor
iowa_model = DecisionTreeRegressor(random_state=1)
iowa_model.fit(X, y)

Step 4: Make Predictions

Quesition

predictions = ____
print(predictions)

Solution

iowa_model.predict(X)

Tutorial02

from sklearn.metrics import mean_absolute_error

predicted_home_prices = melbourne_model.predict(X)
mean_absolute_error(y, predicted_home_prices)
from sklearn.model_selection import train_test_split

# split data into training and validation data, for both features and target
# The split is based on a random number generator. Supplying a numeric value to
# the random_state argument guarantees we get the same split every time we
# run this script.
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state = 0)
# Define model
melbourne_model = DecisionTreeRegressor()
# Fit model
melbourne_model.fit(train_X, train_y)

# get predicted prices on validation data
val_predictions = melbourne_model.predict(val_X)
print(mean_absolute_error(val_y, val_predictions))

Step 1: Split Your Data

Quesition

Use the train_test_split function to split up your data.
Give it the argument random_state=1 so the check functions know what to expect when verifying your code.
Recall, your features are loaded in the DataFrame X and your target is loaded in y.

# Import the train_test_split function and uncomment
# from _ import _

# fill in and uncomment
# train_X, val_X, train_y, val_y = ____

Solution

from sklearn.model_selection import train_test_split
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)

Step 2: Specify and Fit the Model

Quesition

Create a DecisionTreeRegressor model and fit it to the relevant data. Set random_state to 1 again when creating the model.

# You imported DecisionTreeRegressor in your last exercise
# and that code has been copied to the setup code above. So, no need to
# import it again

# Specify the model
iowa_model = ____

# Fit iowa_model with the training data.

Solution

iowa_model = DecisionTreeRegressor(random_state=1)
iowa_model.fit(train_X, train_y)

Step 3: Make Predictions with Validation data

Quesition

# Predict with all validation observations
val_predictions = ____

Solution

val_predictions = iowa_model.predict(val_X)

Step 4: Calculate the Mean Absolute Error in Validation Data

Quesition

from sklearn.metrics import mean_absolute_error
val_mae = ____

# uncomment following line to see the validation_mae
#print(val_mae)

Solution

val_mae = mean_absolute_error(val_predictions, val_y)

profile
DataEngineer Lee.

2개의 댓글

comment-user-thumbnail
2022년 9월 8일

Thanks for the easy to understand and detailed tutorial. I feel so lucky to have found this article, I was looking for something similar fall guys

답글 달기
comment-user-thumbnail
2023년 7월 27일

Believe me, you are amazing because you use only a few words to inspire others and that is an incredible talent. Dave The Diver

답글 달기