In machine learning experiments the relationship between the dependent and independent variable is often assumed as linear, however this is not always the case. Sometimes the relationship between dependent and independent variables is more complex. Creating new polynomial features sometimes might help in capturing that relationship which otherwise may go unnoticed. PyCaret can create polynomial features from existing features using polynomial_features parameter within setup.
Parameters in setup
polynomial_features: bool, default = False
When set to True, new features are created based on all polynomial combinations that exist within the numeric features in a dataset to the degree defined in polynomial_degree param.
polynomial_degree: int, default = 2
Degree of polynomial features. For example, if an input sample is two dimensional and of the form [a, b], the polynomial features with degree = 2 are: [1, a, b, a^2, ab, b^2].
polynomial_threshold: float, default = 0.1
This is used to compress a sparse matrix of polynomial and trigonometric features. Polynomial and trigonometric features whose feature importance based on the combination of Random Forest, AdaBoost and Linear correlation falls within the percentile of the defined threshold are kept in the dataset. Remaining features are dropped before further processing.
How to use?
# Importing dataset from pycaret.datasets import get_data juice = get_data('juice') # Importing module and initializing setup from pycaret.classification import * clf1 = setup(data = juice, target = 'Purchase', polynomial_features = True)
Notice that new features were created from the existing feature space. To expand or compress polynomial feature space, you can use polynomial_threshold parameter which uses feature importance based on the combination of Random Forest, AdaBoost and Linear correlation to filter out the non important polynomial features. polynomial_degree can be used for defining number of degrees to be considered in feature creation.
Try this next