Preprocessing

All setup parameters

Every constructor parameter on every Experiment class.

Every preprocessing knob in PyCaret 4.0 is a constructor parameter on the Experiment class. This page is the one-stop reference.

Common to every task#

These appear on ClassificationExperiment, RegressionExperiment, ClusteringExperiment, AnomalyExperiment, and TimeSeriesExperiment:

ParameterTypeDefaultNotes
session_idint | NoneNoneRandom seed for splits, fold generators, and any randomized estimators.
n_jobsint-1Parallelism for sklearn / pyod / sktime where supported.
use_gpuboolFalseWhen True, set gpu_param='force' on the model registry context — XGBoost / LightGBM / cuML pick it up.
loggerBaseLogger | NoneNoneInject your own logger to capture engine events. Defaults to NullLogger() (silent).
log_experimentboolFalseIf True and logger=None, install a MemoryLogger that records events on experiment.events.
verboseboolFalseReserved for future progress hooks. Currently a no-op.

Supervised tasks (classification + regression)#

ParameterTypeDefaultNotes
targetstrrequiredName of the target column.
train_sizefloat0.7Proportion in the train split. The rest is the holdout.
foldint10CV folds (StratifiedKFold for clf, KFold for reg).
fold_strategystr | BaseCrossValidator"stratifiedkfold" for clf, "kfold" for regPass a sklearn cross-validator for full custom control.
preprocessboolTrueIf False, skips imputation + encoding entirely. Use only for already-clean numeric data.
normalizeboolFalseStandardScaler on numerics.
transformationboolFalsePowerTransformer (Yeo-Johnson) on numerics.
remove_outliersboolFalseIsolationForest filter on the train set.
feature_selectionboolFalseSelectFromModel(median) using ExtraTrees.

Clustering / anomaly#

ParameterTypeDefaultNotes
preprocessboolTrueSame as supervised — gates the impute + encode chain.
normalizeboolFalseOften a good idea for distance-based clusterers.
transformationboolFalseYeo-Johnson on numerics.

remove_outliers and feature_selection aren't supported yet on unsupervised tasks. If you set them post-init, the experiment falls back to legacy behavior (which raises in 4.0).

Time-series#

ParameterTypeDefaultNotes
targetstr | NoneNoneOptional for univariate Series input.
fhint | List[int] | ForecastingHorizon1Forecast horizon — number of periods to predict.
seasonal_periodint | str | NoneNone (auto-detect)Override the auto-detected seasonal period.
foldint3Number of CV folds.
fold_strategy"expanding" | "sliding" | splitter"expanding"sktime splitter or one of the strings.

What was removed in 4.0#

If you were using these in 3.x, here's where they went:

3.x parameter4.0
numeric_imputation_valueRemoved; imputation is mean / mode only. Pre-process if you need different values.
categorical_imputationRemoved; mode-imputation is hardcoded.
encoding_methodRemoved; ordinal encoding is the only option.
polynomial_features, polynomial_degreeRemoved.
group_featuresRemoved.
ignore_featuresRemoved. Drop the column upstream.
bin_numeric_featuresRemoved.
combine_rare_levelsRemoved.
outliers_thresholdHardcoded at 0.05 (no parameter yet).
pca, pca_method, pca_componentsRemoved. Pre-process if needed.
silent, verbose, html, experiment_custom_tags, log_plots, log_data, log_profileRemoved. The 4.0 logger replaces these.
setup_kwargs (and any escape-hatch kwargs)Raise ConfigurationError on fit().

If you need a removed parameter, open an issue — we'd rather bring it back as a first-class constructor parameter than offer a configuration escape hatch.