The Remove Outliers function in PyCaret allows you to identify and remove outliers from the dataset before training the model. Outliers are identified through PCA linear dimensionality reduction using the Singular Value Decomposition technique. It can be achieved using remove_outliers parameter within setup. The proportion of outliers are controlled through outliers_threshold parameter.
Parameters in setup
remove_outliers: bool, default = False
When set to True, outliers from the training data are removed using PCA linear dimensionality reduction using the Singular Value Decomposition technique.
outliers_threshold: float, default = 0.05
The percentage / proportion of outliers in the dataset can be defined using the outliers_threshold param. By default, 0.05 is used which means 0.025 of the values on each side of the distribution’s tail are dropped from training data.
How to use?
# Importing dataset from pycaret.datasets import get_data insurance = get_data('insurance') # Importing module and initializing setup from pycaret.regression import * reg1 = setup(data = insurance, target = 'charges', remove_outliers = True)