2026-04-24
Session 30: Internal-state drain: transformed splits + fold generator + model registry
Engineering log for session 30.
Baseline: session 29 drained the 7 user-facing data accessors. Session 30 drains the 6 internal self._legacy.<x> reads still present inside the drained verbs.
CHANGED — engine#
CHANGED—packages/engine/pycaret/core/experiment.py—_snapshot_fit_state()captures 6 new slots in_fit_stateat the end offit():X_transformed,X_train_transformed,y_transformed,y_train_transformed,fold_generator,model_registry(adict(legacy._all_models_internal)copy).CHANGED— 13 callsites drained acrossexperiment.py,supervised.py,unsupervised.py:create_model(supervised native) — readsX_train_transformed,y_train_transformed,fold_generator, andmodel_registryfrom_fit_state._resolve_supervised_estimator— readsmodel_registry.tune_model—X_train_transformed,y_train_transformed,fold_generator,model_registry.compare_models—model_registry.stack_models—fold_generator.calibrate_model—fold_generator.finalize_model—X_transformed,y_transformed.- Unsupervised
create_model+assign_model—X_transformed,model_registry.
ADDED — tests#
ADDED—packages/engine/tests/test_session30_internal_state_drain.py— 5 tests with a generalised drain-lock helper (_PoisonedAttrAccess) that raises on every dunder a verb might use to read legacy state (__getattr__,__getitem__,__contains__,__iter__,__len__). Tests cover:create_modelsucceeds after every internal-state attr on_legacyis poisoned.tune_modelresolves the search space from the snapshot'smodel_registryeven whenlegacy._all_models_internalis dropped to{}.finalize_modelre-fits using_fit_state["X_transformed"]rather than legacy.- Clustering
create_modelfits on the snapshot'sX_transformed. - Sanity: all 13
_fit_statekeys are populated post-fit.
INTERNAL#
INTERNAL— Why a copy of the registry, not just a reference._all_models_internalis mutable on the legacy class — extensions likeadd_metriccould grow it. Taking adict(legacy._all_models_internal)shallow-copy at fit-time makes the snapshot stable: future legacy mutations don't leak into already-fitted experiments. Trade-off: if a user genuinely wants theiradd_metriccall to affect a previously-fitted experiment, they'd need to re-snapshot. That's an edge case worth the reproducibility win.INTERNAL—_PoisonedAttrAccessis the drain-lock pattern at scale. Sessions 22-28 monkeypatched a single legacy method to raise; session 29 monkeypatched a property; session 30 needs to lock down ~6 attrs across multiple dunder methods. Wrapping them all in one sentinel object that raises on every access keeps the test code short and the failure mode clear ("a verb reached for self._legacy.<x>").
Session 30 delta summary#
| Metric | Session 29 end | Session 30 end |
|---|---|---|
_legacy.<internal-state> reads in drained verbs | 13 | 0 ✅ |
| Engine tests (fast + slow) | 108 | 113 |
| Combined tests | 254 | 259 |