It is often seen in machine learning experiments when two features combined through an arithmetic operation becomes more significant in explaining variances in the data, than the same two features separately. Creating a new feature through interaction of existing features is known as feature interaction. It can achieved in PyCaret using feature_interaction and feature_ratio parameters within setup. Feature interaction creates new features by multiplying two variables (a * b), while feature ratios create new features but by calculating the ratios of existing features (a / b).
Parameters in setup
feature_interaction: bool, default = False
When set to True, it will create new features by interacting (a * b) for all numeric variables in the dataset including polynomial and trigonometric features (if created). This feature is not scalable and may not work as expected on datasets with large feature space.
feature_ratio: bool, default = False
When set to True, it will create new features by calculating the ratios (a / b) of all numeric variables in the dataset. This feature is not scalable and may not work as expected on datasets with large feature space.
interaction_threshold: bool, default = 0.01
Similar to polynomial_threshold, It is used to compress a sparse matrix of newly created features through interaction. Features whose importance based on the combination of Random Forest, AdaBoost and Linear correlation falls within the percentile of the defined threshold are kept in the dataset. Remaining features are dropped before further processing.
How to use?
# Importing dataset from pycaret.datasets import get_data insurance = get_data('insurance') # Importing module and initializing setup from pycaret.regression import * reg1 = setup(data = insurance, target = 'charges', feature_interaction = True, feature_ratio = True)
Try this next