Optimize Threshold


In Classification problems the cost of false positives are almost never same as cost of false negatives. As such, if you are optimizing a business problem where Type 1 and Type 2 errors have different impact, you can optimize your classifier for a probability threshold value to optimize the custom loss function simply by defining cost of true positives, true negatives, false positives and false negatives separately. Optimizing threshold in PyCaret is as simple as writing optimize_threshold. It takes a trained model object (a classifier) and the loss function simply represented by true positives, true negatives, false positives and false negatives. This function returns an interactive plot where loss function (y-axis) is represented as a function of different probability threshold values on x-axis. A vertical line is then shown to represent the best value of probability threshold for that specific classifier. Probability threshold optimized using optimize_threshold can then be used in predict_model function to generate labels using the custom probability threshold. Normally, all classifiers are trained to predict positive class at 50%.

This function is only available inĀ pycaret.classification module.

 

Example

 

Code
# Importing dataset
from pycaret.datasets import get_data
credit = get_data('credit')

# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = credit, target = 'default')

# create a model
xgboost = create_model('xgboost')

# optimize threshold for trained model
optimize_threshold(xgboost, true_negative = 1500, false_negative = -5000)

 

Output

Try this next