Ordinal Encoding

When categorical features in the dataset contain variables with intrinsic natural order such as Low, Medium and High, these must be encoded differently than nominal variables (where there is no intrinsic order for e.g. Male or Female). This can be achieved in PyCaret using ordinal_features parameter within setup which accepts a dictionary with feature name and levels in the increasing order from lowest to highest.


Parameters in setup: 

ordinal_features: dictionary, default = None
When the data contains ordinal features, they must be encoded differently using the ordinal_features param. If the data has a categorical variable with values of ‘low’, ‘medium’, ‘high’ and it is known that low < medium < high, then it can be passed as ordinal_features = { ‘column_name’ : [‘low’, ‘medium’, ‘high’] }. The list sequence must be in increasing order from lowest to highest.


How to use?


# Importing dataset
from pycaret.datasets import get_data
employee = get_data('employee')

# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = employee, target = 'left', ordinal_features = {'salary' : ['low', 'medium', 'high']})


Try this next


Was this page helpful?

Send feedback