Compare Models


This is the first step we recommend in the workflow of any supervised experiment. This function trains all the models in the model library using default hyperparameters and evaluates performance metrics using cross-validation. It returns the trained model object. The evaluation metrics used are:

  • Classification: Accuracy, AUC, Recall, Precision, F1, Kappa, MCC
  • Regression: MAE, MSE, RMSE, R2, RMSLE, MAPE

The output of the function is a table showing the averaged score of all models across the folds. The number of folds can be defined using the fold parameter within the compare_models function. By default, the fold is set to 10. The table is sorted (highest to lowest) by the metric of choice and can be defined using the sort parameter. By default, the table is sorted by Accuracy for classification experiments and R2 for regression experiments. Certain models are prevented from the comparison because of their longer run-time. In order to bypass this prevention, the turbo parameter can be set to False.

This function is only available in pycaret.classification and pycaret.regression modules.

 

Classification Example

 

Code
# Importing dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# return best model
best = compare_models()

# return top 3 models based on 'Accuracy'
top3 = compare_models(n_select = 3)

# return best model based on AUC
best = compare_models(sort = 'AUC') #default is 'Accuracy'

# compare specific models
best_specific = compare_models(include = ['dt','rf','xgboost'])

# blacklist certain models
best_specific = compare_models(exclude = ['catboost', 'svm'])

 

Sample Output

Regression Example

 

Code
# Importing dataset
from pycaret.datasets import get_data
boston = get_data('boston')

# Importing module and initializing setup
from pycaret.regression import *
reg1 = setup(data = boston, target = 'medv')

# return best model
best = compare_models()

# return top 3 models based on 'R2'
top3 = compare_models(n_select = 3)

# return best model based on MAPE
best = compare_models(sort = 'MAPE') #default is 'R2'

# compare specific models
best_specific = compare_models(include = ['dt','rf','xgboost'])

# blacklist certain models
best_specific = compare_models(exclude = ['catboost', 'svm'])

 

Sample Output

Try this next