Bin Numeric Features
Feature binning is a method of turning continuous variables into categorical values using pre-defined number of bins. It is effective when a continuous feature has too many unique values or few extreme values outside the expected range. Such extreme values influence on the trained model, thereby affecting the prediction accuracy of the model. In PyCaret, continuous numeric features can be binned into intervals using bin_numeric_features parameter within setup. PyCaret uses the ‘sturges’ rule to determine the number of bins and also uses K-Means clustering to convert continuous numeric features into categorical features.
Parameters in setup
bin_numeric_features: list, default = None
When a list of numeric features is passed they are transformed into categorical features using K-Means, where values in each bin have the same nearest center of a 1D k-means cluster. The number of clusters are determined based on the ‘sturges’ method. It is only optimal for gaussian data and underestimates the number of bins for large non-gaussian datasets.
How to use?
# Importing dataset from pycaret.datasets import get_data income = get_data('income') # Importing module and initializing setup from pycaret.classification import * clf1 = setup(data = income, target = 'income >50K', bin_numeric_features = ['age'])
Try this next