One Hot Encoding


Machine learning algorithms cannot work directly with categorical data and they must be transformed into numeric values before training a model. Most common type of categorical encoding is One Hot Encoding (also known as dummy encoding) where each categorical level becomes a separate feature in the dataset containing binary values (1 or 0). Since this is an imperative step to perform a ML experiment, PyCaret will transform all categorical features in dataset using one hot encoding. This is ideal for features having nominal categorical data i.e. data cannot be ordered. In other different scenarios other methods of encoding must be used. For example, when the data is ordinal i.e. data has intrinsic levels, ordinal encoding must be used. One Hot Encoding works on all features that are either inferred as categorical or are forced as categorical using categorical_features parameter within setup.

 

Example: 


 

# Importing dataset
from pycaret.datasets import get_data
pokemon = get_data('pokemon')

# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = pokemon, target = 'Legendary')

 

Try this next


 

Was this page helpful?

Send feedback