Changing Data Types


Each feature in the dataset has an associated data type such as numeric feature, categorical feature or date-time feature. PyCaret’s inference algorithm automatically detects the data type of each feature. However, sometimes the data types inferred by PyCaret are incorrect. Ensuring data types are correct is important as several downstream processes depend on the data type of the features, for example: missing value imputation for numeric and categorical features should be performed separately. To overwrite the inferred data types, numeric_features, categorical_features and date_features parameters can be passed within setup.

 

Parameters in setup:


numeric_features: string, default = None
If the inferred data types are not correct, numeric_features can be used to overwrite the inferred type. If when running setup the type of ‘column1’ is  inferred as a categorical instead of numeric, then this parameter can be used to overwrite by passing numeric_features = [‘column1’].

categorical_features: string, default = None
If the inferred data types are not correct, categorical_features can be used to overwrite the inferred type. If when running setup the type of ‘column1’ is inferred as numeric instead of categorical, then this parameter can be used to overwrite the type by passing categorical_features = [‘column1’].

date_features: string, default = None
If the data has a DateTime column that is not automatically detected when running setup, this parameter can be used by passing date_features = ‘date_column_name’. It can work with multiple date columns. Date columns are not used in modeling. Instead, feature extraction is performed and date columns are dropped from the dataset. If the date column includes a time stamp, features related to time will also be extracted.

ignore_features: string, default = None
If any feature should be ignored for modeling, it can be passed to the param ignore_features. The ID and DateTime columns when inferred, are automatically set to ignore for modeling.

 

How to use?


 

# Importing dataset
from pycaret.datasets import get_data
hepatitis = get_data('hepatitis')

# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = hepatitis, target = 'Class', categorical_features = ['AGE'])

 

Similarly if you would like to overwrite data type of any feature to be numeric you can use numeric_features parameter within setup. PyCaret inference algorithm will automatically detect Date Time but if it fails to do so you can use date_features parameter to pass date columns. ignore_features parameter within setup can be used to ignore certain features. See example below:

 

Ignore Features:


 

# Importing dataset
from pycaret.datasets import get_data
pokemon = get_data('pokemon')

# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = pokemon, target = 'Legendary', ignore_features = ['#', 'Name'])

 

Try this next


 

Was this page helpful?

Send feedback