Module

pycaret.core.experiment

The PyCaret 4.0 Experiment base class — a real sklearn-compatible object.

Design rationale in docs/revamp/ARCHITECTURE.md.

Two concrete guarantees this class delivers:

It's a BaseEstimator. get_params, set_params, __sklearn_tags__, __sklearn_is_fitted__, HTML repr, clone — all work.
fit(X, y=None) is the real entry point. PyCaret 4.0 is OOP-only; the 3.x module-level setup() functional API was removed.

Class hierarchy#

Experiment
    ├── SupervisedExperiment
    │       ├── ClassificationExperiment
    │       ├── RegressionExperiment
    │       └── TimeSeriesExperiment
    └── UnsupervisedExperiment
            ├── ClusteringExperiment
            └── AnomalyExperiment

Experiment (this module) hosts task-agnostic verbs: fit, create_model, predict_model, plot_model, evaluate_model, persistence, introspection, and data-access properties.
SupervisedExperiment (pycaret.core.supervised) adds comparison/tuning verbs: compare_models, tune_model, ensemble_model, blend_models, stack_models, calibrate_model, finalize_model, interpret_model, automl, get_leaderboard.
UnsupervisedExperiment (pycaret.core.unsupervised) adds assign_model.

Transition note#

During the multi-session revamp every verb delegates to a legacy _SupervisedExperiment / _UnsupervisedExperiment held as self._legacy. Each verb is rewritten natively on top of sklearn.pipeline.Pipeline in subsequent sessions, replacing the delegation call in place.

Classes 1

Experiment

extends BaseEstimator

Task-agnostic base for every PyCaret 4.0 experiment.

Parameters#

task : TaskType

Fixed by task subclasses; not intended for direct user configuration.

target : str, optional

Column name of the target / label. Required for supervised tasks only.

session_id : int, optional train_size : float, default=0.7 fold : int, default=10 fold_strategy : str | BaseCrossValidator, default="stratifiedkfold" preprocess : bool, default=True normalize : bool, default=False transformation : bool, default=False remove_outliers : bool, default=False feature_selection : bool, default=False n_jobs : int, default=-1 use_gpu : bool, default=False logger : BaseLogger, optional

Event-stream logger. Default `NullLogger()` (silent).

log_experiment : bool, default=False

If True *and* `logger is None`, a `MemoryLogger` is installed.

verbose : bool, default=False

Compatibility flag passed to the legacy engine for progress-bar control.

Notes#

Parameters are stored verbatim per sklearn convention; no work in init.
Subclasses pre-configure task and may override the native setup hooks (_native_setup_supervised / _native_setup_unsupervised / _native_setup_timeseries) for task-specific shapes.

Attributes

task
target
session_id
train_size
fold
fold_strategy
preprocess
normalize
transformation
remove_outliers
feature_selection
n_jobs
use_gpu
logger
log_experiment
verbose
X: pd.DataFrame
X_train: pd.DataFrame
X_test: pd.DataFrame
y
y_train
y_test
preprocess_pipeline: Pipeline
events: list

Methods

fit(self, X: pd.DataFrame, y: Any | None = None, setup_kwargs: Any = {}) -> Experiment

Run preprocessing, splits, and state setup. Returns self.

Supervised tasks:

- Pass `X` containing the target column *or* pass `(X, y)`.
- `self.target` must be set, or `y` must carry a name.

Unsupervised tasks:

- Pass `X` only. `self.target` is ignored.

create_model(self, estimator: Any, fold: Any | None = None, cross_validation: bool = True, fit_kwargs: dict | None = None, round: int = 4, verbose: bool = False, estimator_kwargs: Any = {}) -> CreateResult

Train a single model and return a typed CreateResult.

Session-24 drain (supervised path): for classification + regression experiments, this verb no longer delegates to self._legacy.create_model. It resolves the estimator, runs cross-validation with the task's metric registry, refits on the full training set, and returns a real sklearn Pipeline (preprocessor + trained model glued together). Downstream verbs like predict_model can call .predict on the returned pipeline directly — no more transitional bare-estimator branch.

Clustering / anomaly / time-series still delegate to the legacy engine (their create_model code paths have different shapes; separate drain sessions).

Parameters#

estimator : str or sklearn-compatible object

Either a model ID from the engine's registry (e.g. `"lr"`,
`"rf"`) or a pre-constructed estimator with `.fit` /
`.predict`.

fold : int or cross-validator, optional

If `None`, uses the experiment's configured CV generator
(`self.fold` + `self.fold_strategy`).

cross_validation : bool, default=True

If False, skips CV + just fits on the training set.

fit_kwargs : dict, optional

Extra kwargs forwarded to `model.fit`.

round : int, default=4

Decimal places for the metrics DataFrame.

verbose : bool, default=False

Reserved; currently ignored (legacy progress-bar hook).

**estimator_kwargs

When `estimator` is a registry ID, forwarded to its
constructor (merged over the registry's defaults).

Returns#

CreateResult

`pipeline` is a real sklearn Pipeline (preprocessing +
trained model); `metrics` is the per-fold score DataFrame
(index: `Fold 0..N-1`, `Mean`, `Std`) when CV ran, else
`None`; `model_id` is the registry key or the estimator
class name; `params` is the estimator's `get_params(deep=False)`.

predict_model(self, estimator: Any, data: pd.DataFrame | None = None, raw_score: bool = False, round: int = 4, verbose: bool = False) -> PredictResult

Run prediction and return a typed PredictResult.

Session-23 drain: this verb no longer delegates to self._legacy.predict_model. It calls estimator.predict / estimator.predict_proba directly, with a transitional accommodation for bare estimators.

Parameters#

estimator : sklearn.pipeline.Pipeline or fitted estimator

Preferred: a fitted sklearn Pipeline with preprocessing baked in
(that's what `create_model` / `compare_models` / `tune_model`
will return once their drains land in sessions 24+). For now,
a bare fitted estimator is also accepted — we wrap it on-the-fly
with `self.preprocess_pipeline` to transform new data. Must
implement `.predict`; raises TypeError otherwise.

data : pandas.DataFrame, optional

New input. If `None`, the holdout set is used (`self.X_test`
/ `self.y_test` for supervised tasks, `self.X` for
unsupervised). If `data` contains the target column, it's
split off automatically + used to compute metrics.

raw_score : bool, default=False

Classification only. True → per-class probability columns
(`prediction_score_<class>`). False (default) → single
`prediction_score` column with winning-class probability.

round : int, default=4

Decimal places for probability / metric columns.

verbose : bool, default=False

Reserved for future notebook-progress hooks; currently ignored.

Returns#

PredictResult

`predictions` is a DataFrame with the original X + (if known)
the target column + `prediction_label` (+ optional
`prediction_score` for classification). Clustering / anomaly
use task-specific columns (`Cluster` / `Anomaly` +
`Anomaly_Score`).
`metrics` is a one-row DataFrame for supervised tasks when
`y` is known; `None` otherwise.

plot_model(self, estimator: Any, plot: str | None = None, save: bool | str = False, kwargs: Any = {}) -> Any

Render a single Plotly diagnostic for a fitted estimator.

Session 54 reimplementation. Each task class registers a dictionary of named plots in _build_plot_registry(estimator). plot= looks up the entry and returns a plotly.graph_objects.Figure.

Parameters#

estimator

A fitted `sklearn.pipeline.Pipeline` (or `ForecastingPipeline`
for time-series). Some plots — like classification's
`class_distribution` — ignore the estimator and use the
experiment's data only; pass any fitted pipeline.

plot : str, optional

Plot kind. Defaults to the task's canonical first-look diagnostic
(`'auc'` for classification, `'residuals'` for regression,
`'cluster'` for clustering, `'score'` for anomaly,
`'forecast'` for time-series). Pass an unknown kind to see the
full set in the `ValueError` message.

save : bool | str, default False

`False` → return the Figure. `True` → write to
`f"{plot}.png"` and return that path. A string → write to that
path. Static export requires `pycaret[export]` (pulls
`kaleido`).

**kwargs

Passed through to the underlying plot function.

Returns#

plotly.graph_objects.Figure | str

The figure, or the saved path when `save` is truthy.

evaluate_model(self, estimator: Any, kwargs: Any = {}) -> dict

Render the curated diagnostic bundle for a task.

Session 54 reimplementation. Returns a dict mapping plot kind → plotly.graph_objects.Figure. Iterates the curated subset of plots that the task class declares in _evaluate_plot_set(), calling each plot function defensively — any one that raises (e.g. a SHAP entry without shap installed) is simply skipped, not propagated.

The 3.x version showed an interactive ipywidget tab strip. The 4.0 version returns the underlying figures so callers can render them however they want — fig.show() in notebooks, fig.to_dict() for HTTP transport, etc.

Returns#

dict[str, plotly.graph_objects.Figure]

One Figure per plot kind that succeeded.

pull(self, args: Any = (), kwargs: Any = {}) -> pd.DataFrame | None

Return the most recent metrics DataFrame.

Reads from self._fit_state["last_metrics"], which is updated by each native verb (create_model, tune_model, compare_models, ensemble / blend / stack / calibrate / finalize) before it returns. None if no metrics-emitting verb has run yet — phase 6 removed the legacy self._legacy.pull() fallback.

models(self, args: Any = (), internal: bool = False, kwargs: Any = {}) -> pd.DataFrame

Return a DataFrame describing the available models in the registry.

Session-31 / 35 drain: builds the DataFrame from the snapshot's model registry. With internal=True returns the engine-internal view (Special / Class / Equality / Args) by reading each container's get_dict(internal=True) directly. Falls back to the legacy holder only when the snapshot is empty.

Session-45 (phase 5d) added native TS filter for model_type ∈ TSModelTypes so this method no longer defers to legacy for any task. Phase 6 removed the legacy fallback for empty registries — that case is now an error condition.

get_metrics(self, args: Any = (), kwargs: Any = {}) -> pd.DataFrame

Return a DataFrame describing the available metrics for the task.

Session-31 drain + session-32: reads from the per-experiment metric registry stored in self._fit_state["metric_registry"]. That registry is initialised at fit() time from the task helper and mutated by add_metric / remove_metric.

add_metric(self, id: str, name: str, score_func: Any, target: str = 'pred', greater_is_better: bool = True, args: dict | None = None, is_multiclass: bool = True, kwargs: Any = {}) -> Any

Session-32 drain. Builds the right <Task>MetricContainer for the experiment's task and stashes it in self._fit_state["metric_registry"]. Subsequent create_model / tune_model / compare_models calls will compute this metric on every fold + include it in the leaderboard.

Parameters mirror the legacy contract:

Parameters#

id : str

Short identifier; used as the key in the registry.

name : str

Long name (e.g. `"Mean Absolute Percentage Error"`).

score_func : callable

`(y_true, y_pred) -> float` (or with `y_pred_proba` if
`target="pred_proba"`).

target : str, default="pred"

`"pred"` / `"pred_proba"` / `"threshold"`. Maps to the
input the score function expects.

greater_is_better : bool, default=True

False for error metrics where lower is better.

args : dict, optional

Extra kwargs always passed to `score_func`.

is_multiclass : bool, default=True

(Classification only.) Whether the metric supports multiclass.

Returns#

The registered container. Suitable for downstream introspection.

remove_metric(self, name_or_id: str) -> None

Remove a metric from the experiment's registry.

Session-32 drain. Pops from self._fit_state["metric_registry"]; accepts either the metric's id or its display name (matching legacy semantics).

get_config(self, variable: str | None = None) -> Any

Return a configured experiment variable.

Session-33 drain. Reads from self._fit_state (snapshot of all the data accessors + transformed splits + registries) and the constructor parameters stored on self. Raises ValueError for unknown names; with variable=None returns the full list of accessible names.

Parameters#

variable : str, optional

Name to look up. `None` returns the list of accessible names.

Returns#

Either the value (or the list of names if variable=None).

set_config(self, variable: str | None = None, value: Any = None, kwargs: Any = {}) -> None

Update a configured experiment variable.

Session-33 drain. Restricted to a small allowlist of constructor params that can be safely mutated post-fit (_SETTABLE_CONFIG_KEYS). Anything else raises ValueError.

Two call shapes:

Single: set_config("n_jobs", 4)
Bulk: set_config(n_jobs=4, verbose=True)

save_model(self, model: Any, path: Any, verbose: bool = False) -> Any

Persist a fitted model / pipeline to path (joblib-dumped).

Returns the absolute Path of the file written.

save_model is legal before fit — the verb is about the passed model, not about experiment state. When the logger has not yet been installed (fit is where that happens), the MODEL_SAVED event is silently dropped rather than raising.

load_model(self, path: Any, verbose: bool = False) -> Any

Load a model previously written by save_model.

save_experiment(self, path: Any, verbose: bool = False) -> Any

Persist the full Experiment (including fit state) to path.

The Experiment must be fitted. To re-hydrate, use Experiment.load_experiment(path).

load_experiment(path: Any, verbose: bool = False) -> Experiment

Re-hydrate an Experiment previously saved by save_experiment.

Returns the loaded Experiment. Raises TypeError if the file on disk was not a PyCaret Experiment.

list_models(self)

Typed model list for this experiment; runtime-aware.

list_metrics_cards(self)

describe_setup_params(self)