Create Model


Creating a model in any module is as simple as writing create_model. It takes only one parameter i.e. the model abbreviation as string. For supervised modules (classification and regression) this function returns a table with k-fold cross validated scores of common evaluation metrics along with trained model object. For unsupervised modules (clustering, anomaly detection, natural language processing and association rule mining) this function only returns trained model object. The evaluation metrics used are:

  • Classification: Accuracy, AUC, Recall, Precision, F1, Kappa
  • Regression: MAE, MSE, RMSE, R2, RMSLE, MAPE

The number of folds can be defined using fold parameter within create_model function. By default, the fold is set to 10. All the metrics are rounded to 4 decimals by default by can be changed using round parameter within create_model. Although there is a separate function to ensemble the trained model, however there is a quick way available to ensemble the model while creating by using ensemble parameter along with method parameter within create_model function.

 

See Examples


 

Classification

 

Estimator Abbrev. String
Logistic Regression ‘lr’
K Nearest Neighbour ‘knn’
Naives Bayes ‘nb’
Decision Tree ‘dt’
SVM (Linear) ‘svm’
SVM (RBF) ‘rbfsvm’
Gaussian Process ‘gpc’
Multi Level Perceptron ‘mlp’
Ridge Classifier ‘ridge’
Random Forest ‘rf’
Quadratic Discriminant Analysis ‘qda’
AdaBoost ‘ada’
Gradient Boosting Classifier ‘gbc’
Linear Discriminant Analysis ‘lda’
Extra Trees Classifier ‘et’
Extreme Gradient Boosting ‘xgboost’
Light Gradient Boosting ‘lightgbm’
Cat Boost Classifier ‘catboost’

 

Regression

 

Estimator Abbrev. String
Linear Regression ‘lr’
Lasso Regression ‘lasso’
Ridge Regression ‘ridge’
Elastic Net ‘en’
Least Angle Regression ‘lar’
Lasso Least Angle Regression ‘llar’
Orthogonal Matching Pursuit ‘omp’
Bayesian Ridge ‘br’
Automatic Relevance Determination ‘ard’
Passive Aggressive Regressor ‘par’
Random Sample Consensus ‘ransac’
TheilSen Regressor ‘tr’
Huber Regressor ‘huber’
Kernel Ridge ‘kr’
Support Vector Machine ‘svm’
K Neighbors Regressor ‘knn’
Decision Tree ‘dt’
Random Forest ‘rf’
Extra Trees Regressor ‘et’
AdaBoost Regressor ‘ada’
Gradient Boosting Regressor ‘gbr’
Multi Level Perceptron ‘mlp’
Extreme Gradient Boosting ‘xgboost’
Light Gradient Boosting ‘lightgbm’
CatBoost Regressor ‘catboost’

 

Clustering

 

Model Abbrev. String
K-Means Clustering ‘kmeans’
Affinity Propagation ‘ap’
Mean shift Clustering ‘meanshift’
Spectral Clustering ‘sc’
Agglomerative Clustering ‘hclust’
Density-Based Spatial Clustering ‘dbscan’
OPTICS Clustering ‘optics’
Birch Clustering ‘birch’
K-Modes Clustering ‘kmodes’
Anomaly Detection

 

Model Abbrev. String
Angle-base Outlier Detection ‘abod’
Isolation Forest ‘iforest’
Clustering-Based Local Outlier ‘cluster’
Connectivity-Based Outlier Factor ‘cof’
Histogram-based Outlier Detection ‘histogram’
k-Nearest Neighbors Detector ‘knn’
Local Outlier Factor ‘lof’
One-class SVM detector ‘svm’
Principal Component Analysis ‘pca’
Minimum Covariance Determinant ‘mcd’
Subspace Outlier Detection ‘sod’
Stochastic Outlier Selection ‘sos

 

Natural Language Processing

 

Model Abbrev. String
Latent Dirichlet Allocation ‘lda’
Latent Semantic Indexing ‘lsi’
Hierarchical Dirichlet Process ‘hdp’
Random Projections ‘rp’
Non-Negative Matrix Factorization ‘nmf’

 

Classification Example

 

Code
# Importing dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# creating logistic regression model
lr = create_model('lr')

 

Output

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=123, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

Regression Example

 

Code
# Importing dataset
from pycaret.datasets import get_data
boston = get_data('boston')

# Importing module and initializing setup
from pycaret.regression import *
reg1 = setup(data = boston, target = 'medv')

# creating xgboost model
xgboost = create_model('xgboost')

 

Output

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0,
             importance_type='gain', learning_rate=0.1, max_delta_step=0,
             max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
             n_jobs=-1, nthread=None, objective='reg:linear', random_state=123,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
             silent=None, subsample=1, verbosity=0)

Clustering Example

 

Code
# Importing dataset
from pycaret.datasets import get_data
jewellery = get_data('jewellery')

# Importing module and initializing setup
from pycaret.clustering import *
clu1 = setup(data = jewellery)

# creating kmeans model
kmeans = create_model('kmeans')

 

Output
KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=4, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=123, tol=0.0001, verbose=0)

Anomaly Detection Example

 

Code
# Importing dataset
from pycaret.datasets import get_data
anomalies = get_data('anomalies')

# Importing module and initializing setup
from pycaret.anomaly import *
ano1 = setup(data = anomalies)

# creating Isolation Forest model
iforest = create_model('iforest')

 

Output
IForest(behaviour='new', bootstrap=False, contamination=0.05,
    max_features=1.0, max_samples='auto', n_estimators=100, n_jobs=1,
    random_state=786, verbose=0)

Natural Language Processing Example

 

Code
# Importing dataset
from pycaret.datasets import get_data
kiva = get_data('kiva')

# Importing module and initializing setup
from pycaret.nlp import *
nlp1 = setup(data = kiva, target = 'en')

# creating LDA model
lda = create_model('lda')

 

Output
LdaModel(num_terms=2742, num_topics=4, decay=0.5, chunksize=100)

Association Rule Example

 

Code
# Importing dataset
from pycaret.datasets import get_data
france = get_data('france')

# Importing module and initializing setup
from pycaret.arules import *
arule1 = setup(data = france, transaction_id = 'InvoiceNo', item_id = 'Description')

# creating Association Rule model
mod1 = create_model(metric = 'confidence')

 

Output

Try this next