Ensemble Model


Ensembling a trained model is as simple as writing ensemble_model. It takes only one mandatory parameter i.e. the trained model object. This functions returns a table with k-fold cross validated scores of common evaluation metrics along with trained model object. The evaluation metrics used are:

  • Classification: Accuracy, AUC, Recall, Precision, F1, Kappa, MCC
  • Regression: MAE, MSE, RMSE, R2, RMSLE, MAPE

The number of folds can be defined using fold parameter within ensemble_model function. By default, the fold is set to 10. All the metrics are rounded to 4 decimals by default by can be changed using round parameter. There are two methods available for ensembling that can be set using method parameter within ensemble_model function. Both the methods require re-sampling of the data and fitting multiple estimators, hence the number of estimators can be controlled using n_estimators parameter. By default, n_estimators is set to 10.

This function is only available in pycaret.classification and pycaret.regression modules.

 

Bagging:

Bagging, also known as Bootstrap aggregating, is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special case of the model averaging approach.

 

Boosting:

Boosting is an ensemble meta-algorithm for primarily reducing bias and variance in supervised learning. Boosting is in the family of machine learning algorithms that convert weak learners to strong ones. A weak learner is defined to be a classifier that is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.

 

Classification Example

 

Code
from pycaret.datasets import get_data 
diabetes = get_data('diabetes') 

# Importing module and initializing setup 
from pycaret.classification import * 
clf1 = setup(data = diabetes, target = 'Class variable') 

# train a decision tree model
dt = create_model('dt')

# train a bagging classifier on dt
bagged_dt = ensemble_model(dt, method = 'Bagging')

# train a adaboost classifier on dt with 100 estimators
boosted_dt = ensemble_model(dt, method = 'Boosting', n_estimators = 100)

 

Sample Output

Regression Example

 

Code
from pycaret.datasets import get_data 
boston = get_data('boston') 

# Importing module and initializing setup 
from pycaret.regression import * 
reg1 = setup(data = boston, target = 'medv') 

# train a decision tree model
dt = create_model('dt')

# train a bagging classifier on dt
bagged_dt = ensemble_model(dt, method = 'Bagging')

# train a adaboost classifier on dt with 100 estimators
boosted_dt = ensemble_model(dt, method = 'Boosting', n_estimators = 100)

 

Sample Output

Try this next