Principal Component Analysis

Principal Component Analysis (PCA) is an unsupervised technique used in machine learning to reduce the dimensionality of a data. It does so by compressing the feature space by identifying a subspace that captures most of the information in the complete feature matrix. It projects the original feature space into lower dimensionality. This can be achieved in PyCaret using pca parameter within setup.


Parameters in setup 

pca: bool, default = False
When set to True, dimensionality reduction is applied to project the data into a lower dimensional space using the method defined in pca_method param. In supervised learning pca is generally performed when dealing with high feature space and memory is a constraint. Note that not all datasets can be decomposed efficiently using a linear PCA technique and that applying PCA may result in loss of information. As such, it is advised to run multiple experiments with different pca_methods to evaluate the impact.

pca_method: string, default = ‘linear’
The ‘linear’ method performs Linear dimensionality reduction using Singular Value Decomposition. The other available options are:

kernel : dimensionality reduction through the use of RVF kernel.
incremental : replacement for ‘linear’ pca when the dataset to be decomposed is too large to fit in memory

pca_components: int/float, default = 0.99
Number of components to keep. if pca_components is a float, it is treated as a target percentage for information retention. When pca_components is an integer it is treated as the number of features to be kept. pca_components must be strictly less than the original number of features in the dataset.


How to use?


# Importing dataset
from pycaret.datasets import get_data
income = get_data('income')

# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = income, target = 'income >50K', pca = True, pca_components = 10)


Try this next


Was this page helpful?

Send feedback