Bin Numeric Features


Feature binning is a method of turning continuous variables into categorical values using pre-defined number of bins. It is effective when a continuous feature has too many unique values or few extreme values outside the expected range. Such extreme values influence on the trained model, thereby affecting the prediction accuracy of the model. In PyCaret, continuous numeric features can be binned into intervals using bin_numeric_features parameter within setup. PyCaret uses the ‘sturges’ rule to determine the number of bins and also uses K-Means clustering to convert continuous numeric features into categorical features.

 

Parameters in setup 


bin_numeric_features: list, default = None
When a list of numeric features is passed they are transformed into categorical features using K-Means, where values in each bin have the same nearest center of a 1D k-means cluster. The number of clusters are determined based on the ‘sturges’ method. It is only optimal for gaussian data and underestimates the number of bins for large non-gaussian datasets.

 

How to use?


  

# Importing dataset
from pycaret.datasets import get_data
income = get_data('income')

# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = income, target = 'income >50K', bin_numeric_features = ['age'])

 

Try this next


 

Was this page helpful?

Send feedback