2026-04-25
Session 42: Phase 5c (cont.): drain TimeSeriesExperiment.compare_models
Engineering log for session 42.
The third TS verb is fully native. exp.compare_models() no longer touches legacy.compare_models — it iterates the sktime registry, calls native create_model per candidate (which already does CV via the s40 native path), and assembles a leaderboard ranked by MASE (default — lower is better).
ADDED — engine#
ADDED—TimeSeriesExperiment.compare_modelsinpackages/engine/pycaret/tasks/time_series.py. Mirrors the supervised_compare_models_supervised_nativepattern (s26):- Filters:
include=/exclude=/turbo=. Default-include skipsensemble_forecaster. - Iterates candidates →
create_model(cand, ...)→ pulls theMeanrow from each metrics DataFrame. - Sorts ascending or descending based on whether the
sortmetric is in_TS_ASCENDING_METRICS(MASE / RMSSE / MAE / RMSE / MAPE / SMAPE) or not. - Returns top-K pipelines + leaderboard + ranked_ids in a
CompareResult. - Calls
_set_last_metrics(leaderboard)sopull()returns it. errors="ignore"skips failures;errors="raise"propagates.- Empty-result handling: returns empty
CompareResultrather than raising.
- Filters:
ADDED— Two class-level constants:_TS_REGISTRY_EXCLUDE(frozenset of model IDs to skip by default — currently{"ensemble_forecaster"}) and_TS_ASCENDING_METRICS(frozenset of error-metric display names that sort ascending).
ADDED — tests#
ADDED—packages/engine/tests/test_session42_ts_compare_models_drain.py— 9 new tests:- Drain-lock for
compare_models. - Leaderboard sorted ascending by MASE.
n_select=3returns top-3 fittedForecastingPipelineinstances.exclude=filter drops named models.ensemble_forecasterfiltered out via runtime construction failure +errors=ignore.errors="ignore"skips failures.pull()returns the leaderboard.- End-to-end
compare → predictchain with all 3 legacy verbs poisoned. - All-failures path returns empty
CompareResult.
- Drain-lock for
INTERNAL#
INTERNAL— Why filterensemble_forecasterby default. It's in_all_models_internal(TS legacy registry) but constructing it requires runtime-built sub-forecasters (legacy passesforecasters=...from a custom logic path that depends on the existing leaderboard). For nativecompare_modelsto construct it, we'd need to first run a partial compare to get the constituents, then build the ensemble — a chicken-and-egg problem the legacy avoids by handling it in a special branch. Excluding it from the defaultincludemirrors whatlegacy.models()already does (filtered bymodel_type ∈ TSModelTypes, which doesn't include"ensemble").INTERNAL—_TS_ASCENDING_METRICSis class-level. Lookup is O(1) viafrozensetmembership; no per-call construction. Six error metrics are listed (MASE / RMSSE / MAE / RMSE / MAPE / SMAPE). R2 and COVERAGE default to descending (greater-is-better).INTERNAL— All-TS turbo flag is True. Every TS container ships withis_turbo=True, so theturbo=Truebranch is essentially a no-op for the default registry. Kept for parity with the supervised signature; users can pass custom containers withis_turbo=Falseto opt out.
Session 42 delta summary#
| Metric | Session 41 end | Session 42 end |
|---|---|---|
| TS verbs drained (out of 6) | 2 (create_model, predict_model) | 3 (+ compare_models) |
legacy.compare_models callsites for TS | 1 | 0 |
| Engine tests (fast + slow) | 209 | 218 |