Normalization


Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to rescale the values of numeric columns in the dataset without distorting differences in the ranges of values or losing information. This can be achieved in PyCaret using normalize¬†parameter within setup. There are several methods available for normalization, by default it uses ‘zscore’¬†to normalize the data, which can be changed using normalize_method parameter within setup.

 

Parameters in setup


normalize: bool, default = False
When set to True, the feature space is transformed using the normalized_method param. Generally, linear algorithms perform better with normalized data however, the results may vary and it is advised to run multiple experiments to evaluate the benefit of normalization.

normalize_method: string, default = ‘zscore’
Defines the method to be used for normalization. By default, normalize method is set to ‘zscore’. The standard zscore is calculated as z = (x – u) / s. The other available options are:

  • z-score : The standard zscore is calculated as z = (x – u) / s
  • minmax : scales and translates each feature individually such that it is in the range of 0 – 1.
  • maxabs : scales and translates each feature individually such that the maximal absolute value of each feature will be 1.0. It does not shift/center the data, and thus does not destroy any sparsity.
  • robust : scales and translates each feature according to the Interquartile range. When the dataset contains outliers, robust scaler often gives better results.

 

How to use?


 

# Importing dataset
from pycaret.datasets import get_data
pokemon = get_data('pokemon')

# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = pokemon, target = 'Legendary', normalize = True)

 

Try this next


 

Was this page helpful?

Send feedback