Experiment(...).fit(data)#
The 3.x setup() function is gone. Every preprocessing knob now
lives on the Experiment constructor, and fit(data) actually
runs the pipeline.
from pycaret.classification import ClassificationExperiment
exp = ClassificationExperiment(
target="Purchase",
session_id=42,
train_size=0.7,
fold=10,
normalize=True,
transformation=True,
).fit(data)After fit(), the experiment is ready for every other verb. State is
stored on exp._fit_state (see
Data preparation).
What fit() does#
- Coerces input — DataFrame for tabular tasks; Series for univariate time-series.
- Builds the preprocessing pipeline — imputer + encoder (always); plus optional power transform, scaler, outlier filter, feature selector.
- Splits train / test — stratified for classification, plain for regression, temporal for time-series, none for clustering / anomaly.
- Builds the fold generator — StratifiedKFold (clf), KFold
(reg), or sktime
ExpandingWindowSplitter(TS). - Builds the model + metric registries — task-specific dicts of estimators and scorers.
- Stores everything in
exp._fit_state.
After this, calling create_model("rf") reads from the registry,
fits an estimator on the (transformed) train set, runs CV via the
fold generator, and returns a CreateResult.
Return value#
fit() returns self, so the typical pattern is:
exp = ClassificationExperiment(target="Purchase").fit(data)You can also call it later if you want to instantiate first and fit on a different dataset, or compose with an environment:
exp = ClassificationExperiment(target="Purchase", session_id=42)
# … some other setup …
exp = exp.fit(data)sklearn-compatible#
Experiment is a sklearn BaseEstimator. After fit(), sklearn's
__sklearn_is_fitted__ returns True; tools that introspect sklearn
estimators (joblib, model registries, deployment frameworks) will
treat your experiment as fitted.
Different signatures for different tasks#
The constructor signature varies by task:
# Tabular supervised: target is required, train_size + fold matter.
ClassificationExperiment(target=..., train_size=..., fold=...).fit(data)
RegressionExperiment(target=..., train_size=..., fold=...).fit(data)
# Tabular unsupervised: no target, no train/test split.
ClusteringExperiment().fit(data)
AnomalyExperiment().fit(data)
# Time-series: fh + seasonal_period instead of target.
TimeSeriesExperiment(fh=12, seasonal_period=12).fit(univariate_series)What's removed#
| 3.x | 4.0 |
|---|---|
setup(data, target=...) | Experiment(target=...).fit(data) |
silent=True | Removed (no UI side effects). |
html=False | Removed (no UI side effects). |
experiment_name, log_experiment | log_experiment exists but its semantics changed — it installs a MemoryLogger for engine events, not an MLflow run. |
Module-level setup() | Removed entirely. There's no module-level functional API in 4.0. |
See All setup parameters for the full constructor reference.