While normalization¬†rescales the data within new limits to reduce the impact of magnitude in the variance, Transformation is a more radical technique. Transformation changes the shape of the distribution such that the transformed data can be represented by a normal or approximate normal distribution. In general, data must be transformed when using ML algorithms that assume normality or a gaussian distribution in the input data. Examples of such models are Logistic Regression, Linear Discriminant Analysis (LDA) and Gaussian Naive Bayes. Transformation of data can be achieved in PyCaret using transformation parameter within setup. There are two methods available for transformation ‘yeo-johnson’ and quantile, that can be defined by transformation_method parameter within setup.


Parameters in setup 

transformation: bool, default = False
When set to True, a power transformation is applied to make the data more normal / Gaussian-like. This is useful for modeling issues related to heteroscedasticity or other situations where normality is desired. The optimal parameter for stabilizing variance and minimizing skewness is estimated through maximum likelihood.

transformation_method: string, default = ‘yeo-johnson’
Defines the method for transformation. By default, the transformation method is set to ‘yeo-johnson’. The other available option is ‘quantile’ transformation. Both the transformation transforms the feature set to follow a Gaussian-like or normal distribution. Note that the quantile transformer is non-linear and may distort linear correlations between variables measured at the same scale.


How to use?


# Importing dataset
from pycaret.datasets import get_data
pokemon = get_data('pokemon')

# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = pokemon, target = 'Legendary', transformation = True)


Notice that no method has been passed. By default, transformation parameter uses ‘yeo-johnson‘ method. To change the method transformation_method parameter can be used within setup.

Try this next


Was this page helpful?

Send feedback