Setting up Environment


 

setup(data, categorical_features = None, categorical_imputation = ‘constant’, ordinal_features = None, high_cardinality_features = None, numeric_features = None, numeric_imputation = ‘mean’, date_features = None, ignore_features = None, normalize = False, normalize_method = ‘zscore’, transformation = False, transformation_method = ‘yeo-johnson’, handle_unknown_categorical = True, unknown_categorical_method = ‘least_frequent’, pca = False, pca_method = ‘linear’, pca_components = None, ignore_low_variance = False, combine_rare_levels = False, rare_level_threshold = 0.10, bin_numeric_features = None, remove_multicollinearity = False, multicollinearity_threshold = 0.9, group_features = None, group_names = None, supervised = False, supervised_target = None, n_jobs = -1, html = True, session_id = None, log_experiment = False, experiment_name = None, log_plots = False, log_profile = False, log_data = False, silent = False, verbose=True, profile = False)

Description:

This function initializes the environment in pycaret. setup() must called before executing any other function in pycaret. It takes one mandatory parameter: dataframe {array-like, sparse matrix}.

Code
#import the dataset from pycaret repository
from pycaret.datasets import get_data
jewellery = get_data('jewellery')

#import clustering module
from pycaret.clustering import *

#intialize the setup
exp_clu = setup(jewellery)

 

Output

Output has been compressed.

jewellery‘ is a pandas Dataframe.

Parameters:

data : dataframe
{array-like, sparse matrix}, shape (n_samples, n_features) where n_samples is the number of samples and n_features is the number of features in dataframe.

categorical_features: string, default = None
If the inferred data types are not correct, categorical_features can be used to overwrite the inferred type. If when running setup the type of ‘column1’ is inferred as numeric instead of categorical, then this parameter can be used to overwrite the type by passing categorical_features = [‘column1’].

categorical_imputation: string, default = ‘constant’
If missing values are found in categorical features, they will be imputed with a constant ‘not_available’ value. The other available option is ‘mode’ which imputes the missing value using most frequent value in the training dataset.

ordinal_features: dictionary, default = None
When the data contains ordinal features, they must be encoded differently using the ordinal_features param. If the data has a categorical variable with values of ‘low’, ‘medium’, ‘high’ and it is known that low < medium < high, then it can be passed as ordinal_features = { ‘column_name’ : [‘low’, ‘medium’, ‘high’] }. The list sequence must be in increasing order from lowest to highest.

high_cardinality_features: string, default = None
When the data contains features with high cardinality, they can be compressed into fewer levels by passing them as a list of column names with high cardinality. Features are compressed using frequency distribution. As such original features are replaced with the frequency distribution and converted into numeric variable.

numeric_features: string, default = None
If the inferred data types are not correct, numeric_features can be used to overwrite the inferred type. If when running setup the type of ‘column1’ is inferred as a categorical instead of numeric, then this parameter can be used to overwrite by passing numeric_features = [‘column1’].

numeric_imputation: string, default = ‘mean’
If missing values are found in numeric features, they will be imputed with the mean value of the feature. The other available option is ‘median’ which imputes the value using the median value in the training dataset.

date_features: string, default = None
If the data has a DateTime column that is not automatically detected when running setup, this parameter can be used by passing date_features =’date_column_name’. It can work with multiple date columns. Date columns are not used in modeling. Instead, feature extraction is performed and date columns are dropped from the dataset. If the date column includes a time stamp, features related to time will also be extracted.

ignore_features: string, default = None
If any feature should be ignored for modeling, it can be passed to the param ignore_features. The ID and DateTime columns when inferred, are automatically set to ignore for modeling.

normalize: bool, default = False
When set to True, the feature space is transformed using the normalized_method param. Generally, linear algorithms perform better with normalized data however,  the results may vary and it is advised to run multiple experiments to evaluate the benefit of normalization.

normalize_method: string, default = ‘zscore’
Defines the method to be used for normalization. By default, normalize method is set to ‘zscore’. The standard zscore is calculated as z = (x – u) / s. The other available options are:

‘minmax’ : scales and translates each feature individually such that it is in the range of 0 – 1.
‘maxabs’ : scales and translates each feature individually such that the maximal absolute value of each feature will be 1.0. It does not shift/center the data, and thus does not destroy any sparsity.
‘robust’ : scales and translates each feature according to the Interquartile range. When the dataset contains outliers, robust scaler often gives better results.

transformation: bool, default = False
When set to True, a power transformation is applied to make the data more normal / Gaussian-like. This is useful for modeling issues related to heteroscedasticity or  other situations where normality is desired. The optimal parameter for stabilizing variance and minimizing skewness is estimated through maximum likelihood.

transformation_method: string, default = ‘yeo-johnson’
Defines the method for transformation. By default, the transformation method is set to ‘yeo-johnson’. The other available option is ‘quantile’ transformation. Both the transformation transforms the feature set to follow a Gaussian-like or normal distribution. Note that the quantile transformer is non-linear and may distort linear  correlations between variables measured at the same scale.

handle_unknown_categorical: bool, default = True
When set to True, unknown categorical levels in new / unseen data are replaced by the most or least frequent level as learned in the training data. The method is  defined under the unknown_categorical_method param.

unknown_categorical_method: string, default = ‘least_frequent’
Method used to replace unknown categorical levels in unseen data. Method can be set to ‘least_frequent’ or ‘most_frequent’.

pca: bool, default = False
When set to True, dimensionality reduction is applied to project the data into a lower dimensional space using the method defined in pca_method param. In supervised learning pca is generally performed when dealing with high feature space and memory is a constraint. Note that not all datasets can be decomposed efficiently using a linear PCA technique and that applying PCA may result in loss of information. As such, it is advised to run multiple experiments with different pca_methods to evaluate the impact.

pca_method: string, default = ‘linear’
The ‘linear’ method performs Linear dimensionality reduction using Singular Value Decomposition. The other available options are:

kernel : dimensionality reduction through the use of RVF kernel.
incremental : replacement for ‘linear’ pca when the dataset to be decomposed is  too large to fit in memory

pca_components: int/float, default = 0.99
Number of components to keep. if pca_components is a float, it is treated as a target percentage for information retention. When pca_components is an integer it is treated as the number of features to be kept. pca_components must be strictly less than the original number of features in the dataset.

ignore_low_variance: bool, default = False
When set to True, all categorical features with statistically insignificant variances are removed from the dataset. The variance is calculated using the ratio of unique  values to the number of samples, and the ratio of the most common value to the frequency of the second most common value.

combine_rare_levels: bool, default = False
When set to True, all levels in categorical features below the threshold defined in rare_level_threshold param are combined together as a single level. There must be at least two levels under the threshold for this to take effect. rare_level_threshold represents the percentile distribution of level frequency. Generally, this technique is applied to limit a sparse matrix caused by high numbers of levels in categorical features.

rare_level_threshold: float, default = 0.1
Percentile distribution below which rare categories are combined. Only comes into effect when combine_rare_levels is set to True.

bin_numeric_features: list, default = None
When a list of numeric features is passed they are transformed into categorical features using K-Means, where values in each bin have the same nearest center of a 1D k-means cluster. The number of clusters are determined based on the ‘sturges’  method. It is only optimal for gaussian data and underestimates the number of bins for large non-gaussian datasets.

remove_multicollinearity: bool, default = False
When set to True, the variables with inter-correlations higher than the threshold defined under the multicollinearity_threshold param are dropped. When two features are highly correlated with each other, the feature with higher average correlation in the feature space is dropped.

multicollinearity_threshold: float, default = 0.9
Threshold used for dropping the correlated features. Only comes into effect when remove_multicollinearity is set to True.

group_features: list or list of list, default = None
When a dataset contains features that have related characteristics, the group_features param can be used for statistical feature extraction. For example, if a dataset has numeric features that are related with each other (i.e ‘Col1’, ‘Col2’, ‘Col3’), a list containing the column names can be passed under group_features to extract statistical information such as the mean, median, mode and standard deviation.

group_names: list, default = None
When group_features is passed, a name of the group can be passed into the group_names param as a list containing strings. The length of a group_names list must equal to the length of group_features. When the length doesn’t match or the name is not passed, new features are sequentially named such as group_1, group_2 etc.

supervised: bool, default = False
When set to True, supervised_target column is ignored for transformation. This param is only for internal use.

supervised_target: string, default = None
Name of supervised_target column that will be ignored for transformation. Only applicable when tune_model() function is used. This param is only for internal use.

n_jobs: int, default = -1
The number of jobs to run in parallel (for functions that supports parallel processing) -1 means using all processors. To run all functions on single processor set n_jobs to None.

html: bool, default = True
If set to False, prevents runtime display of monitor. This must be set to False when using environment that doesnt support HTML.

session_id: int, default = None
If None, a random seed is generated and returned in the Information grid. The unique number is then distributed as a seed in all functions used during the experiment. This can be used for later reproducibility of the entire experiment.

log_experiment: bool, default = True
When set to True, all metrics and parameters are logged on MLFlow server.

experiment_name: str, default = None
Name of experiment for logging. When set to None, ‘clu’ is by default used as alias for the experiment name.

log_plots: bool, default = False
When set to True, specific plots are logged in MLflow as a png file. By default, it is set to False.

log_profile: bool, default = False
When set to True, data profile is also logged on MLflow as a html file. By default, it is set to False.

log_data: bool, default = False
When set to True, train and test dataset are logged as csv.

silent: bool, default = False
When set to True, confirmation of data types is not required. All preprocessing will be performed assuming automatically inferred data types. Not recommended for direct use except for established pipelines.

verbose: Boolean, default = True
Information grid is not printed when verbose is set to False.

profile: bool, default = False
If set to true, a data profile for Exploratory Data Analysis will be displayed in an interactive HTML report.

Returns:

Information Grid: Information grid is printed.

Environment: This function returns various outputs that are stored in variable as tuple. They are used by other functions in pycaret.

 

Create Model


 

create_model(model = None, num_clusters = None, ground_truth = None, verbose=True, system=True, **kwards)

Description:

This function creates a model on the dataset passed as a data param during the setup stage. setup() function must be called before using create_model(). This function returns a trained model object. 

Code

#

# create k-means model
kmeans = create_model('kmeans')

 

Output

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300, n_clusters=4, n_init=10, n_jobs=None, precompute_distances='auto', random_state=1362, tol=0.0001, verbose=0)

Parameters:

model : string / object, default = None
Enter abbreviated string of the model class. List of available models supported:
 
ID Name
‘kmeans’ K-Means Clustering
‘ap’ Affinity Propagation
‘meanshift’ Mean shift Clustering
‘sc’ Spectral Clustering
‘hclust’ Agglomerative Clustering
‘dbscan’ Density-Based Spatial Clustering
‘optics’ OPTICS Clustering
‘birch’ Birch Clustering
‘kmodes’ K-Modes Clustering

num_clusters: int, default = None
Number of clusters to be generated with the dataset. If None, num_clusters is set to 4.

ground_truth: string, default = None
When ground_truth is provided, Homogeneity Score, Rand Index, and Completeness Score is evaluated and printer along with other metrics.

verbose: Boolean, default = True
Status update is not printed when verbose is set to False.

system: Boolean, default = True
Must remain True all times. Only to be changed by internal functions.

**kwargs:
Additional keyword arguments to pass to the estimator.

 

Returns:

Score grid: A table containing the Silhouette, Calinski-Harabasz, Davies-Bouldin, Homogeneity Score, Rand Index, and Completeness Score. Last 3 are only evaluated when ground_truth param is provided.

Model: Trained model object

Warnings:

  • num_clusters not required for Affinity Propagation (‘ap’), Mean shift clustering (‘meanshift’), Density-Based Spatial Clustering (‘dbscan’) and OPTICS Clustering (‘optics’). num_clusters param for these models are automatically determined.
  • When fit doesn’t converge in Affinity Propagation (‘ap’) model, all data points are labelled as -1.
  • Noisy samples are given the label -1, when using Density-Based Spatial (‘dbscan’) or OPTICS Clustering (‘optics’).
  • OPTICS (‘optics’) clustering may take longer training times on large datasets.

Assign Model


 

assign_model(model, transformation = False, verbose = True)

Description:

This function assigns each of the data point in the dataset passed during setup stage to one of the clusters using trained model object passed as model param. create_model() function must be called before using assign_model(). This function returns a pandas Dataframe.

Code
# create a model
kmeans = create_model('kmeans')

# assign labels using trained model
kmeans_df = assign_model(kmeans)

 

Output

This will return a dataframe with inferred clusters using trained model.

Parameters:

model: trained model object, default = None

transformation: bool, default = False
When set to True, assigned clusters are returned on transformed dataset instead of original dataset passed during setup().

verbose: Boolean, default = True
Status update is not printed when verbose is set to False.

Returns:

Dataframe: Returns a dataframe with assigned clusters using a trained model.

 

Plot Model


 

plot_model(model, plot=’cluster’, feature = None, label = False, save = False, system = True)

Description:

This function takes a trained model object and returns a plot on the dataset passed during setup stage. This function internally calls assign_model before generating a plot.

Code
# create a model
kmeans = create_model('kmeans')

# plot a model
plot_model(kmeans) 

 

Output

This will return a cluster scatter plot (by default).

Parameters:

model : object, default = none
A trained model object can be passed. Model must be created using create_model().

plot : string, default = ‘cluster’
Enter abbreviation for type of plot. The current list of plots supported are:

Plot Name
‘cluster’ Cluster PCA Plot (2d)
‘tsne’ Cluster TSnE (3d)
‘elbow’ Elbow Plot
‘silhouette’ Silhouette Plot
‘distance’ Distance Plot
‘distribution’ Distribution Plot

feature : string, default = None
Name of feature column for x-axis of when plot = ‘distribution’. When plot is ‘cluster’ or ‘tsne’ feature column is used as a hover over tooltip and/or label when label is set to True. If no feature name is passed in ‘cluster’ or ‘tsne’ by default the first of column of dataset is chosen as hover over tooltip.

label : bool, default = False
When set to True, data labels are shown in ‘cluster’ and ‘tsne’ plot.

save: Boolean, default = False
Plot is saved as png file in local directory when save parameter set to True.

system: Boolean, default = True
Must remain True all times. Only to be changed by internal functions.

Returns:

Visual Plot: Prints the visual plot.

 

Tune Model


 

tune_model(model = None, supervised_target = None, estimator = None, optimize = None, custom_grid = None, fold = 10, verbose = True)

Description:

This function tunes the num_clusters model parameter using a predefined grid with the objective of optimizing a supervised learning metric as defined in the optimize param. You can choose the supervised estimator from a large library available in pycaret. By default, supervised estimator is Linear.

This function returns the tuned model object.

Code
#import the dataset from pycaret repository
from pycaret.datasets import get_data
juice = get_data('juice')

#import clustering module
from pycaret.clustering import *

#intialize the setup
exp_clu = setup(juice) 

#tune k-means model
tuned_kmeans = tune_model(model = 'kmeans', supervised_target = 'Purchase')

 

Output

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=4, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=8326, tol=0.0001, verbose=0)

Parameters:

model : string, default = None
Enter abbreviated name of the model. List of available models supported:

ID Name
‘kmeans’ K-Means Clustering
‘ap’ Affinity Propagation
‘meanshift’ Mean shift Clustering
‘sc’ Spectral Clustering
‘hclust’ Agglomerative Clustering
‘dbscan’ Density-Based Spatial Clustering
‘optics’ OPTICS Clustering
‘birch’ Birch Clustering
‘kmodes’ K-Modes Clustering


supervised_target: string
Name of the target column for supervised learning.

estimator: string, default = None

ID Name Task
‘lr’ Logistic Regression Classification
‘knn’ K Nearest Neighbour Classification
‘nb’ Naives Bayes Classification
‘dt’ Decision Tree Classification
‘svm’ SVM (Linear) Classification
‘rbfsvm’ SVM (RBF) Classification
‘gpc’ Gaussian Process Classification
‘mlp’ Multi Level Perceptron Classification
‘ridge’ Ridge Classifier Classification
‘rf’ Random Forest Classification
‘qda’ Quadratic Disc. Analysis Classification
‘ada’ AdaBoost Classification
‘gbc’ Gradient Boosting Classifier Classification
‘lda’ Linear Disc. Analysis Classification
‘et’ Extra Trees Classifier Classification
‘xgboost’ Extreme Gradient Boosting Classification
‘lightgbm’ Light Gradient Boosting Classification
‘catboost’ Cat Boost Classifier Classification
‘lr’ Linear Regression Regression
‘lasso’ Lasso Regression Regression
‘ridge’ Ridge Regression Regression
‘en’ Elastic Net Regression
‘lar’ Least Angle Regression Regression
‘llar’ Lasso Least Angle Regression Regression
‘omp’ Orthogonal Matching Pursuit Regression
‘br’ Bayesian Ridge Regression
‘ard’ Automatic Relevance Determination Regression
‘par’ Passive Aggressive Regressor Regression
‘ransac’ Random Sample Consensus Regression
‘tr’ TheilSen Regressor Regression
‘huber’ Huber Regressor Regression
‘kn’ Kernel Ridge Regression
‘svm’ Support Vector Machine Regression
‘knn’ K Neighbors Regressor Regression
‘dt’ Decision Tree Regression
‘rf’ Random Forest Regression
‘et’ Extra Trees Regressor Regression
‘ada’ AdaBoost Regressor Regression
‘gbr’ Gradient Boosting Regressor Regression
‘mlp’ Multi Level Perceptron Regression
‘xgboost’ Extreme Gradient Boosting Regression
‘lightgbm’ Light Gradient Boosting Machine Regression
‘catboost’ CatBoost Regressor Regression

If set to None, Linear model is used by default for both classification and regression tasks.

optimize: string, default = None

For Classification tasks:
Accuracy, AUC, Recall, Precision, F1, Kappa

For Regression tasks:
MAE, MSE, RMSE, R2, RMSLE, MAPE

If set to None, default is ‘Accuracy’ for classification and ‘R2’ for regression tasks.

custom_grid: list, default = None
By default, a pre-defined number of clusters is iterated over to optimize the supervised objective. To overwrite default iteration, pass a list of num_clusters to iterate over in custom_grid param.

fold: integer, default = 10
Number of folds to be used in K-fold CV. Must be at least 2.

verbose: Boolean, default = True
Status update is not printed when verbose is set to False.

Returns:

Visual Plot: Visual plot with num_clusters param on x-axis with metric to optimize on y-axis. Also, prints the best model metric.

Model: Trained model object with best num_clusters param. 

Warnings:

  • Affinity Propagation, Mean shift clustering, Density-Based Spatial Clustering and OPTICS Clustering cannot be used in this function since they do not support num_clusters param.

Predict Model


 

predict_model(model, data, platform=None, authentication=None)

Description:

This function is used to predict new data using a trained model. It requires a trained model object created using one of the function in pycaret that returns a trained model object. New data must be passed to data param as pandas Dataframe.

Code
# create a model
kmeans = create_model('kmeans')

# generate predictions
kmeans_predictions = predict_model(kmeans, data = jewellery)

 

Output

Parameters:

model : object / string, default = None
When model is passed as string, load_model() is called internally to load the pickle file from active directory or cloud platform when platform param is passed.

data : dataframe
{array-like, sparse matrix}, shape (n_samples, n_features) where n_samples is the number of samples and n_features is the number of features. All features used during training must be present in the new dataset.

platform: string, default = None
Name of platform, if loading model from cloud. Current available options are: ‘aws’.

authentication : dict
dictionary of applicable authentication tokens.

When platform = ‘aws’:
{‘bucket’ : ‘Name of Bucket on S3’}

Returns:

Dataframe : Dataframe with predictions.

Warnings:

  • Models that do not support ‘predict’ function cannot be used in predict_model().

Deploy Model


 

deploy_model(model, model_name, authentication, platform = ‘aws’)

Description:
(In Preview)

This function deploys the transformation pipeline and trained model object for production use. The platform of deployment can be defined under the platform param along with the applicable authentication tokens which are passed as a dictionary to the authentication param.

Code
# create a model
kmeans = create_model('kmeans')

# deploy a model
deploy_model(model = kmeans, model_name = 'deploy_kmeans', platform = 'aws', authentication = {'bucket' : 'pycaret-test'})

 

Output

This will deploy the model on an AWS S3 account under bucket ‘pycaret-test’

For AWS users:

Before deploying a model to an AWS S3 (‘aws’), environment variables must be configured using the command line interface. To configure AWS env. variables, type aws configure in your python command line. The following information is required which can be generated using the Identity and Access Management (IAM)  portal of your amazon console account:

  • AWS Access Key ID
  • AWS Secret Key Access
  • Default Region Name (can be seen under Global settings on your AWS console)
  • Default output format (must be left blank)

Parameters:

model : object
A trained model object should be passed as an estimator.

model_name : string
Name of model to be passed as a string.

authentication : dict
dictionary of applicable authentication tokens.

When platform = ‘aws’:
{‘bucket’ : ‘Name of Bucket on S3’}

platform: string, default = ‘aws’
Name of platform for deployment. Current available options are: ‘aws’.

Returns:

Message : Success Message

Warnings:

  • This function uses file storage services to deploy the model on cloud platform. As such, this is efficient for batch-use. Where the production objective is to  obtain prediction at an instance level, this may not be the efficient choice as it transmits the binary pickle file between your local python environment and the platform.

Save Model


 

save_model(model, model_name, verbose=True)

Description:

This function saves the transformation pipeline and trained model object into the current active directory as a pickle file for later use.

Code
# create a model
kmeans = create_model('kmeans')

# save a model
save_model(kmeans, 'kmeans_model_23122019')

 

Output

Parameters:

model : object, default = none
A trained model object should be passed as an estimator.

model_name : string, default = none
Name of pickle file to be passed as a string.

verbose: Boolean, default = True
Success message is not printed when verbose is set to False.

Returns:

Message : Success Message

 

Load Model


 

load_model(model_name, platform = None, authentication = None, verbose=True)

Description:

This function loads a previously saved transformation pipeline and model from the current active directory into the current python environment. Load object must be a pickle file.

Code
saved_kmeans = load_model('kmeans_model_23122019')

 

Output

Parameters:

model_name : string, default = none
Name of pickle file to be passed as a string.

platform: string, default = None
Name of platform, if loading model from cloud. Current available options are: ‘aws’.

authentication : dict
dictionary of applicable authentication tokens.

When platform = ‘aws’:
{‘bucket’ : ‘Name of Bucket on S3’}

verbose: Boolean, default = True
Success message is not printed when verbose is set to False.

Returns:

Message : Success Message

 

Models


 

models(type = None)

Description:

Returns table of models available in model library.
Code
# show all models in library 
all_models = models()

 

Output

Parameters:

No parameters

Returns:

Dataframe: Pandas dataframe with meta data for all models.

Get Logs


 

get_logs(experiment_name = None, save = False)

Description:

Returns a table with experiment logs consisting run details, parameter, metrics and tags.
Code
# store experiment logs in pandas dataframe
logs = get_logs()

 

Output

Parameters:
experiment_name : string, default = None
set to None current active run is used.
 
save : bool, default = False
When set to True, csv file is saved in current directory.

Returns:

Dataframe: Pandas dataframe with logs.

Get Config


 

get_config(variable)

Description:

This function is used to access global environment variables. Following variables can be accessed:
 
  • X: Transformed dataset
  • data_: Original dataset
  • seed: random state set through session_id
  • prep_pipe: Transformation pipeline configured through setup
  • prep_param: prep_param configured through setup
  • n_jobs_param: n_jobs parameter used in model training
    html_param: html_param configured through setup
  • exp_name_log: Name of experiment set through setup
  • logging_param: log_experiment param set through setup
  • log_plots_param: log_plots param set through setup
  • USI: Unique session ID parameter set through setup
Code
# get X dataframe
X = get_config('X')

 

Output

Set Config


 

set_config(variable, value)

Description:
This function is used to reset global environment variables. Following variables can be accessed:

 

  • X: Transformed dataset
  • data_: Original dataset
  • seed: random state set through session_id
  • prep_pipe: Transformation pipeline configured through setup
  • prep_param: prep_param configured through setup
  • n_jobs_param: n_jobs parameter used in model training
  • html_param: html_param configured through setup
  • exp_name_log: Name of experiment set through setup
  • logging_param: log_experiment param set through setup
  • log_plots_param: log_plots param set through setup
  • USI: Unique session ID parameter set through setup
Code
# change seed value in environment to '999'
set_config('seed', 999)

 

Output

No output.

Get System Logs


 

get_system_logs()

Description:

Read and print ‘logs.log’ file from current active directory.
Code
# Reading system logs in Notebook
get_system_logs()

 

Output

MLFlow UI


 

mlflow ui

Description:

Execute function in the current working directory to open MLFlow server on localhost:5000
Code
# loading dataset
from pycaret.datasets import get_data
data = get_data('jewellery')

# initializing setup
from pycaret.clustering import *
clu1 = setup(data, log_experiment = True, experiment_name = 'jewellery1')

# create kmeans model
kmeans = create_model('kmeans') 

# create kmodes model
kmodes = create_model('kmodes')

# run mlflow server (notebook)
!mlflow ui

### just 'mlflow ui' when running through command line.

 

Output

Try this next


 

Was this page helpful?

Send feedback