2026-04-24
Session 28: God-class drain: unsupervised verbs (clustering + anomaly)
Engineering log for session 28.
Baseline: session 27 finished the supervised drain (13 verbs). Session 28 drains the 3 unsupervised verbs — clustering + anomaly experiments now run natively.
CHANGED — engine#
CHANGED, BREAKING—packages/engine/pycaret/core/unsupervised.py—UnsupervisedExperiment.create_model. No longer delegates toself._legacy.create_model. Resolves the estimator from the registry (or accepts a pre-constructed object), fits onself._legacy.X_transformed, and returns aCreateResultwhose.pipelineis a real sklearn Pipeline.- Signature:
(estimator, *, num_clusters=None, fraction=None, fit_kwargs=None, round=4, verbose=False, **estimator_kwargs).num_clusterstranslates ton_clusters;fractiontranslates tocontaminationfor anomaly. - Constructor falls through to the registry's default kwargs if a forwarded kwarg is rejected (
AffinityPropagationhas non_clusters,MeanShifthas none, etc.). - Dropped 3.x cruft:
ground_truth,experiment_custom_tags,system,add_to_model_list,raise_num_clusters,X_data,display.
- Signature:
CHANGED, BREAKING—UnsupervisedExperiment.assign_model. No longer delegates. Unwraps Pipeline → bare model, readsmodel.labels_(andmodel.decision_scores_for anomaly), and decorates a copy ofself.X.transformation=TruereturnsX_transformedrows;score=Falseskips theAnomaly_Scorecolumn for anomaly tasks.CHANGED— CBLOF retry preserved. Theclusteranomaly detector (CBLOF) canValueErroron degenerate cluster separation. The native path mirrors legacy: catch the error,model.set_params(n_clusters=12), refit. Wrapped in aRuntimeErrorwith the legacy's message if the retry also fails.
ADDED — tests#
ADDED—packages/engine/tests/test_session28_unsupervised.py— 11 tests covering clustering + anomaly create_model + assign_model + drain-locks + edge cases (unknown ID,score=False,NotFittedError).
INTERNAL#
INTERNAL—predict_model's transitional bare-estimator branch is now universally dead. Both supervised and unsupervisedcreate_modelreturn real Pipelines, so thepreprocessor.transform(X)fallback insidepredict_modelis only reachable when a caller passes a bare estimator directly (uncommon). The comment incore/experiment.py:predict_modelis updated to reflect this — the branch lives on as a belt-and-braces fallback rather than a transitional accommodation.INTERNAL— Why CBLOF gets explicit retry, not a generic one. Other anomaly detectors fail with different errors (pyod'sABODhas its own gotchas around small datasets, etc.). The legacy code only retries CBLOF specifically, so we mirror that. A more general "try with smaller / larger param" retry would be over-engineering for v1.INTERNAL— ConstructorTypeErrorfallback. When the registry args + user kwargs include something the underlying class doesn't accept (e.g.AffinityPropagation(n_clusters=4)— no such kwarg), we catch theTypeErrorand retry with just the registry defaults. Cleaner than gating on a per-algorithm allowlist + handles future registry changes gracefully.
Session 28 delta summary#
| Metric | Session 27 end | Session 28 end |
|---|---|---|
OOP verbs still on self._legacy (clf/reg/clu/anomaly) | 0/0/2/2 | 0/0/0/0 ✅ |
| Engine tests (fast + slow) | 93 | 104 |
| Combined tests | 239 | 250 |