New Function

This is a new function introduced in PyCaret 2.0. See release notes here.

MLFlow


PyCaret 2.0 embeds MLflow Tracking component as a backend API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results. To start logging your experiments set log_experiment parameter within setup to True and defines experiment name using experiment_name parameter.


Parameters

log_experiment: bool, default = False
When set to True, all metrics and parameters are logged on MLFlow server.

experiment_name: str, default = None
Name of experiment for logging. When set to None, ‘clf’ is by default used as alias for the experiment name.

log_plots: bool, default = False
When set to True, specific plots are logged in MLflow as a png file. By default, it is set to False.

log_profile: bool, default = False
When set to True, data profile is also logged on MLflow as a html file. By default, it is set to False.

log_data: bool, default = False
When set to True, train and test dataset are logged as csv.

 

Example

 

Code
# loading dataset
from pycaret.datasets import get_data
data = get_data('diabetes')

# initializing setup
from pycaret.classification import *
clf1 = setup(data, target = 'Class variable', log_experiment = True, experiment_name = 'diabetes1')

# compare all baseline models and select top 5
top5 = compare_models(n_select = 5) 

# tune top 5 base models
tuned_top5 = [tune_model(i) for i in top5]

# ensemble top 5 tuned models
bagged_top5 = [ensemble_model(i) for i in tuned_top5]

# blend top 5 base models 
blender = blend_models(estimator_list = top5) 

# run mlflow server (notebook)
!mlflow ui

### just 'mlflow ui' when running through command line.

 

Output

 

Configure tracking server


When no backend is configured Data is stored locally at the provided file (or ./mlruns if empty). To configure the backend use mlflow.set_tracking_uri in your Notebook or IDE before executing the setup function.

mlflow.set_tracking_uri(uri)

Set the tracking server URI. This does not affect the currently active run (if one exists), but takes effect for successive runs.


Parameters

URI:

  • An empty string, or a local file path, prefixed with file:/. Data is stored locally at the provided file (or ./mlruns if empty).
  • An HTTP URI like https://my-tracking-server:5000.
  • A Databricks workspace, provided as the string “databricks” or, to use a Databricks CLI profile, “databricks://<profileName>”.

 

Example

# set tracking uri 
import mlflow 
mlflow.set_tracking_uri('file:/c:/users/mlflow-server')

# loading dataset
from pycaret.datasets import get_data
data = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data, target = 'Class variable', log_experiment = True, experiment_name = 'diabetes1')

# all logs are now stored in uri defined in mlflow.set_tracking_uri function.

 

PyCaret on Databricks


When using PyCaret on Databricks experiment_name parameter in setup must include complete path to storage.  See example below on how to log experiments when using Databricks:

# loading dataset
from pycaret.datasets import get_data
data = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data, target = 'Class variable', log_experiment = True, experiment_name = '/Users/username@domain.com/experiment-name-here')