Creating Clusters using the existing features from the data is an unsupervised ML technique to engineer and create new features. It uses iterative approach to determine the number of clusters using combination of Calinski-Harabasz and Silhouette criterion. Each data point with the original features is assigned to a cluster. The assigned cluster label is then used as a new feature in predicting target variable. This can be achieved in PyCaret using create_clusters parameter within setup.
Parameters in setup
create_clusters: bool, default = False
When set to True, an additional feature is created where each instance is assigned to a cluster. The number of clusters is determined using a combination of Calinski-Harabasz and Silhouette criterion.
cluster_iter: int, default = 20
Number of iterations used to create a cluster. Each iteration represents cluster size. Only comes into effect when create_clusters param is set to True.
How to use?
# Importing dataset from pycaret.datasets import get_data insurance = get_data('insurance') # Importing module and initializing setup from pycaret.regression import * reg1 = setup(data = insurance, target = 'charges', create_clusters = True)
Try this next