Import
import pandas as pd
melbourne_file_path = '../input/melbourne-housing-snapshot/melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path)
melbourne_data.columns
Index(['Suburb', 'Address', 'Rooms', 'Type', 'Price', 'Method', 'SellerG',
'Date', 'Distance', 'Postcode', 'Bedroom2', 'Bathroom', 'Car',
'Landsize', 'BuildingArea', 'YearBuilt', 'CouncilArea', 'Lattitude',
'Longtitude', 'Regionname', 'Propertycount'],
dtype='object')
Quesition
print the list of columns in the dataset to find the name of the prediction target
y = ____
# Check your answer
step_1.check()
Solution
y = home_data.SalePrice
Quesition
Step 2: Create X
Now you will create a DataFrame called X holding the predictive features.
Since you want only some columns from the original data, you'll first create a list with the names of the columns you want in X.
You'll use just the following columns in the list (you can copy and paste the whole list to save some typing, though you'll still need to add quotes): LotArea YearBuilt 1stFlrSF 2ndFlrSF FullBath BedroomAbvGr * TotRmsAbvGrd
After you've created that list of features, use it to create the DataFrame that you'll use to fit the model.
# Create the list of features below
feature_names = ___
# Select data corresponding to features in feature_names
X = ____
Solution
feature_names = ["LotArea", "YearBuilt", "1stFlrSF", "2ndFlrSF",
"FullBath", "BedroomAbvGr", "TotRmsAbvGrd"]
X=home_data[feature_names]
Quesition
Create a DecisionTreeRegressor and save it iowa_model. Ensure you've done the relevant import from sklearn to run this command.
Then fit the model you just created using the data in X and y that you saved above.
# from _ import _
#specify the model.
#For model reproducibility, set a numeric value for random_state when specifying the model
iowa_model = ____
# Fit the model
Solution
from sklearn.tree import DecisionTreeRegressor
iowa_model = DecisionTreeRegressor(random_state=1)
iowa_model.fit(X, y)
Quesition
predictions = ____
print(predictions)
Solution
iowa_model.predict(X)
from sklearn.metrics import mean_absolute_error
predicted_home_prices = melbourne_model.predict(X)
mean_absolute_error(y, predicted_home_prices)
from sklearn.model_selection import train_test_split
# split data into training and validation data, for both features and target
# The split is based on a random number generator. Supplying a numeric value to
# the random_state argument guarantees we get the same split every time we
# run this script.
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state = 0)
# Define model
melbourne_model = DecisionTreeRegressor()
# Fit model
melbourne_model.fit(train_X, train_y)
# get predicted prices on validation data
val_predictions = melbourne_model.predict(val_X)
print(mean_absolute_error(val_y, val_predictions))
Quesition
Use the train_test_split function to split up your data.
Give it the argument random_state=1 so the check functions know what to expect when verifying your code.
Recall, your features are loaded in the DataFrame X and your target is loaded in y.
# Import the train_test_split function and uncomment
# from _ import _
# fill in and uncomment
# train_X, val_X, train_y, val_y = ____
Solution
from sklearn.model_selection import train_test_split
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
Quesition
Create a DecisionTreeRegressor model and fit it to the relevant data. Set random_state to 1 again when creating the model.
# You imported DecisionTreeRegressor in your last exercise
# and that code has been copied to the setup code above. So, no need to
# import it again
# Specify the model
iowa_model = ____
# Fit iowa_model with the training data.
Solution
iowa_model = DecisionTreeRegressor(random_state=1)
iowa_model.fit(train_X, train_y)
Quesition
# Predict with all validation observations
val_predictions = ____
Solution
val_predictions = iowa_model.predict(val_X)
Quesition
from sklearn.metrics import mean_absolute_error
val_mae = ____
# uncomment following line to see the validation_mae
#print(val_mae)
Solution
val_mae = mean_absolute_error(val_predictions, val_y)
Believe me, you are amazing because you use only a few words to inspire others and that is an incredible talent. Dave The Diver
Thanks for the easy to understand and detailed tutorial. I feel so lucky to have found this article, I was looking for something similar fall guys