Create Model


Creating a model in any module is as simple as writing create_model. It takes only one parameter i.e. the Model ID as a string. For supervised modules (classification and regression) this function returns a table with k-fold cross validated performance metrics along with the trained model object. For unsupervised module For unsupervised module clustering, it returns performance metrics along with trained model object and for remaining unsupervised modules anomaly detection, natural language processing and association rule mining, it only returns trained model object. The evaluation metrics used are:

  • Classification: Accuracy, AUC, Recall, Precision, F1, Kappa, MCC
  • Regression: MAE, MSE, RMSE, R2, RMSLE, MAPE

The number of folds can be defined using fold parameter within create_model function. By default, the fold is set to 10. All the metrics are rounded to 4 decimals by default by can be changed using round parameter within create_model. Although there is a separate function to ensemble the trained model, however there is a quick way available to ensemble the model while creating by using ensemble parameter along with method parameter within create_model function.

 

See Examples


 

Classification
ID Name
‘lr’ Logistic Regression
‘knn’ K Nearest Neighbour
‘nb’ Naives Bayes
‘dt’ Decision Tree Classifier
‘svm’ SVM – Linear Kernel
‘rbfsvm’ SVM – Radial Kernel
‘gpc’ Gaussian Process Classifier
‘mlp’ Multi Level Perceptron
‘ridge’ Ridge Classifier
‘rf’ Random Forest Classifier
‘qda’ Quadratic Discriminant Analysis
‘ada’ Ada Boost Classifier
‘gbc’ Gradient Boosting Classifier
‘lda’ Linear Discriminant Analysis
‘et’ Extra Trees Classifier
‘xgboost’ Extreme Gradient Boosting
‘lightgbm’ Light Gradient Boosting
‘catboost’ CatBoost Classifier

 

Regression

 

ID Name
‘lr’ Linear Regression
‘lasso’ Lasso Regression
‘ridge’ Ridge Regression
‘en’ Elastic Net
‘lar’ Least Angle Regression
‘llar’ Lasso Least Angle Regression
‘omp’ Orthogonal Matching Pursuit
‘br’ Bayesian Ridge
‘ard’ Automatic Relevance Determination
‘par’ Passive Aggressive Regressor
‘ransac’ Random Sample Consensus
‘tr’ TheilSen Regressor
‘huber’ Huber Regressor
‘kr’ Kernel Ridge
‘svm’ Support Vector Machine
‘knn’ K Neighbors Regressor
‘dt’ Decision Tree
‘rf’ Random Forest
‘et’ Extra Trees Regressor
‘ada’ AdaBoost Regressor
‘gbr’ Gradient Boosting Regressor
‘mlp’ Multi Level Perceptron
‘xgboost’ Extreme Gradient Boosting
‘lightgbm’ Light Gradient Boosting
‘catboost’ CatBoost Regressor

 

Clustering
 
 
ID Name
‘kmeans’ K-Means Clustering
‘ap’ Affinity Propagation
‘meanshift’ Mean shift Clustering
‘sc’ Spectral Clustering
‘hclust’ Agglomerative Clustering
‘dbscan’ Density-Based Spatial Clustering
‘optics’ OPTICS Clustering
‘birch’ Birch Clustering
‘kmodes’ K-Modes Clustering
Anomaly Detection
ID Name
‘abod’ Angle-base Outlier Detection
‘iforest’ Isolation Forest
‘cluster’ Clustering-Based Local Outlier
‘cof’ Connectivity-Based Outlier Factor
‘histogram’ Histogram-based Outlier Detection
‘knn’ k-Nearest Neighbors Detector
‘lof’ Local Outlier Factor
‘svm’ One-class SVM detector
‘pca’ Principal Component Analysis
‘mcd’ Minimum Covariance Determinant
‘sod’ Subspace Outlier Detection
‘sos Stochastic Outlier Selection

 

Natural Language Processing

 

ID Model
‘lda’ Latent Dirichlet Allocation
‘lsi’ Latent Semantic Indexing
‘hdp’ Hierarchical Dirichlet Process
‘rp’ Random Projections
‘nmf’ Non-Negative Matrix Factorization

 

Classification Example

 

Code
# Importing dataset 
from pycaret.datasets import get_data 
diabetes = get_data('diabetes') 

# Importing module and initializing setup 
from pycaret.classification import * 
clf1 = setup(data = diabetes, target = 'Class variable')

# train logistic regression model
lr = create_model('lr') #lr is the id of the model

# check the model library to see all models
models()

# train rf model using 5 fold CV
rf = create_model('rf', fold = 5)

# train svm model without CV
svm = create_model('svm', cross_validation = False)

# train xgboost model with max_depth = 10
xgboost = create_model('xgboost', max_depth = 10)

# train xgboost model on gpu
xgboost_gpu = create_model('xgboost', tree_method = 'gpu_hist', gpu_id = 0) #0 is gpu-id

# train multiple lightgbm models with n learning_rate<br>import numpy as np
lgbms = [create_model('lightgbm', learning_rate = i) for i in np.arange(0.1,1,0.1)]

# train custom model
from gplearn.genetic import SymbolicClassifier
symclf = SymbolicClassifier(generation = 50)
sc = create_model(symclf)

 

Sample Output

Regression Example

 

Code
# Importing dataset 
from pycaret.datasets import get_data 
boston = get_data('boston') 

# Importing module and initializing setup 
from pycaret.regression import * 
reg1 = setup(data = boston, target = 'medv') 

# train linear regression model
lr = create_model('lr') #lr is the id of the model

# check the model library to see all models
models()

# train rf model using 5 fold CV
rf = create_model('rf', fold = 5)

# train svm model without CV
svm = create_model('svm', cross_validation = False)

# train xgboost model with max_depth = 10
xgboost = create_model('xgboost', max_depth = 10)

# train xgboost model on gpu
xgboost_gpu = create_model('xgboost', tree_method = 'gpu_hist', gpu_id = 0) #0 is gpu-id

# train multiple lightgbm models with n learning_rate
import numpy as np
lgbms = [create_model('lightgbm', learning_rate = i) for i in np.arange(0.1,1,0.1)]

# train custom model
from gplearn.genetic import SymbolicRegressor
symreg = SymbolicRegressor(generation = 50)
sc = create_model(symreg)

 

Sample Output

Clustering Example

 

Code
# Importing dataset
from pycaret.datasets import get_data
jewellery = get_data('jewellery')

# Importing module and initializing setup
from pycaret.clustering import *
clu1 = setup(data = jewellery)

# check the model library to see all models
models()

# training kmeans model
kmeans = create_model('kmeans')

# training kmodes model
kmodes = create_model('kmodes')

 

Sample Output

Anomaly Detection Example

 

Code
# Importing dataset
from pycaret.datasets import get_data
anomalies = get_data('anomalies')

# Importing module and initializing setup
from pycaret.anomaly import *
ano1 = setup(data = anomalies)

# check the model library to see all models
models()

# training Isolation Forest
iforest = create_model('iforest')

# training KNN model
knn = create_model('knn')

 

Sample Output
Trained Model Object.

Natural Language Processing Example

 

Code
# Importing dataset
from pycaret.datasets import get_data
kiva = get_data('kiva')

# Importing module and initializing setup
from pycaret.nlp import *
nlp1 = setup(data = kiva, target = 'en')

# check the model library to see all models
models()

# training LDA model
lda = create_model('lda')

# training NNMF model
nmf = create_model('nmf')

 

Sample Output
Trained Model Object

Association Rule Example

 

Code
# Importing dataset
from pycaret.datasets import get_data
france = get_data('france')

# Importing module and initializing setup
from pycaret.arules import *
arule1 = setup(data = france, transaction_id = 'InvoiceNo', item_id = 'Description')

# creating Association Rule model
mod1 = create_model(metric = 'confidence')

 

Output

Try this next


 

Was this page helpful?

GitHub

Send feedback