Means averaging slightly different versions of the same model to improve accuracy
(1) Why Bagging?
: Errors due to Bias(Underfitting) and Variance(Overfitting) exist
(2) Parameters that control bagging
: Changing the seed, Row sampling or Bootstrapping, Shuffling, Column sampling, Model-specific parameters, Number of models or bags, Parallelism
(3) Example of bagging
# train is the training data
# test is the test data
# y is target variable
model = RandomForestRegressor()
bags = 10
seed = 1
bagged_prediction = np.zeros(test.shape[0])
for n in range(0,bags):
model.set_params(random_state = seed+n) # update seed
model.fit(train.y)
preds = model.predict(test)
bagged_prediction +=preds
# take average of predictions
bagged_prediction/= bags
Form of weighted averaging of models where each model is built sequentially via taking into account the past model performance.
= 이전 모델의 성능을 고려하여 각 모델이 순차적으로 만들어지는 모델의 가중 평균의 형식
(1) Weight based boosting
특정한 법칙에 따라 weight를 만들고, weight를 feature의 하나로 추가
(2) Residual based boosting
특정 법칙에 따라 error를 계산하고 y label을 Old_prediction에 따라 새로 정할 것
XGBoost, LightGBM, H20'S GBM, CatBoost 등 지배적인 알고리즘에서 사용하는 방법!
Means making predictions of a number of models in a hold-out set and then using a different meta model to train on these predictions.
예측 모델 부분에서 가장 인기 있는 형태이자 마지막 단계에서 대체로 사용되는 방식
() Stacking Example
from sklearn.ensemble import RandomForestRegressor
training, valid, ytraining, yvalid = train_test_split(train, y, test_size=0.5)
model1 = RandomForestRegressor()
model2 = LinearRegression()
model1.fit(training, ytraining)
model2.fit(training, ytraining)
preds1 = model1.predict(valid)
preds2 = model2.predict(valid)
test_preds1 = model1.predict(test)
test_preds2 = model2.predict(test)
stacked_prediction = np.column_stack(preds1,preds2)
stacked_test_prediction = np.column_stack(test_preds1, test_preds2)
#specifiy meta model
meta_model = LinearRegression()
# fit meta model on stacked predictions
meta_model.fit(stacked_predictions, yvalid)
# make predictions on the stacked predictions of the test data
final_predictions = meta_model.predict(stacked_test_predictions)
() Things to consider
Scalable meta modelling methodology that utilizes stacking to combine multiple models in a neural network architecture of multiple levels
스태킹을 사용하여 여러 모델을 여러 라벨의 NN에 결합하는 확장가능한 메타 모델링 방법론
(1) 1st level tips
(2) subsequent level tips
1) simpler algorithms
2) Feature engineering
3) Be mindful of target leakage!