2026-04-22
Session 2: Phase 4 (Engine Architecture) kickoff
Engineering log for session 2.
Baseline: end of session 1. Environment: unchanged (Python 3.13.13, sklearn 1.7.2, NumPy 2.3.5, pandas 2.x).
Theme: the user called out that the 3.x OOP-on-top-of-functional layering was "a piece of hack shit." Session 2 rebuilds the engine core as a proper sklearn-composable object graph — pycaret.core.Experiment is a real BaseEstimator subclass — while the legacy code paths stay intact under the new class (delegation) so the notebook golden path never breaks during the migration.
ADDED#
ADDED—pycaret.corepackage. New engine primitives:pycaret/core/tasks.py—TaskTypestr-enum (CLASSIFICATION,REGRESSION,CLUSTERING,ANOMALY,TIME_SERIES) withis_supervised/is_classification/is_regressionhelpers.pycaret/core/errors.py—PyCaretErrorhierarchy (ConfigurationErrorextends ValueError,NotFittedErrorextends RuntimeError,UnknownModelError/UnknownMetricErrorextend KeyError) so UI/agent callers can catch engine errors distinctly from upstream ones.pycaret/core/results.py— frozen dataclassesCompareResult,CreateResult,TuneResult,EnsembleResult,BlendResult,StackResult,CalibrateResult,FinalizeResult,PredictResult— the typed return shape of every verb. Each carries fitted pipeline, metrics DataFrame, event trace.pycaret/core/state.py—current_experiment(),set_current_experiment(),reset_current_experiment(),require_current_experiment()backed bycontextvars.ContextVar. Thread- and async-safe replacement for the 3.x module-level global.pycaret/core/experiment.py—Experiment(BaseEstimator)base class. Implementsget_params,set_params,__sklearn_tags__,__sklearn_is_fitted__,fit(X, y=None, **setup_kwargs). Verbs (compare_models,create_model,tune_model,ensemble_model,blend_models,stack_models,calibrate_model,finalize_model,predict_model,plot_model,interpret_model,evaluate_model,automl) delegate to a legacy_SupervisedExperimentheld asself._legacyduring the transition; each returns a typed result dataclass.
ADDED—pycaret.loggingpackage. Replaces the 3.x tracker-adapter concept with a lean structured event stream designed for React UI / LLM agent consumption:pycaret/logging/events.py—Eventfrozen dataclass andEventKindstr-enum with 22 canonical kinds (experiment.started,model.created,model.compared,model.tuned, etc.).Event.to_dict()produces a JSON-serializable dict.pycaret/logging/base.py—BaseLoggerhook interface withlog()/emit()+subscribe(callback)for UI fan-out.NullLogger(default silent) and the 3.x no-op shim methods (log_experiment,log_model,log_model_comparison,log_plot,log_params,log_metrics,log_artifact,log_hpram_grid,log_sklearn_pipeline,init_logger,init_experiment,finish_experiment,set_tags,.loggersproperty) so legacy god-class calls through a PyCaret 4.0BaseLoggerinstance continue to work.pycaret/logging/memory.py—MemoryLogger: thread-safe in-memory buffer with optional JSONL file teeing (flushed after every write so a UI can tail).eventsproperty,as_jsonl()method,clear().
ADDED—pycaret.apipackage — agent+UI introspection surface. All functions return JSON-serializable dataclasses:pycaret/api/cards.py—ParameterCard,ModelCard,MetricCarddataclasses +ParameterKindstr-enum (BOOL, INT, FLOAT, STRING, ENUM, LIST, COLUMN, COLUMNS, MODEL_ID, METRIC_ID, UNKNOWN) — widget hints for a React form.pycaret/api/schemas.py—SetupParamSchemadataclass groupingParameterCards.pycaret/api/describe.py—list_models(task)(19 classification cards, 26 regression cards curated from the legacy containers),describe_model(task, id),list_metrics(task),describe_setup_params(task)(13 common params organised into groups: Data / Experiment / Cross-Validation / Preprocessing / Compute / Logging),list_available_models(experiment)(runtime-aware: flagsis_available=Falsewhen a model's package isn't installed).
ADDED—pycaret.taskspackage — task-specific experiment subclasses.pycaret/tasks/classification.py—ClassificationExperiment(Experiment)pre-configurestask=CLASSIFICATION, setsestimator_type="classifier"in__sklearn_tags__, and explicitly declares all 15 init parameters on the concrete class (rather than via**kwargs) so that sklearn'sget_params()introspection surfaces every configured knob.
ADDED— End-to-end proof the new stack works.ClassificationExperiment(target="Purchase").fit(data).compare_models().predict_model(result.best)on thejuicedataset: fitted in 1.3s, compared 3 models in 8.3s, emitted 5 typed events through the logger, returnedCompareResult/PredictResultdataclasses. Captured in the new-architecture test suite.ADDED—tests/test_core_architecture.py— 17 fast unit tests (0.2s) covering every new primitive: TaskType enum, error hierarchy, frozen result dataclasses, event JSON round-trip, MemoryLogger (captures / subscribers / file teeing), BaseLogger no-op compat methods, ModelCard/MetricCard/ParameterSchema introspection,describe_modelraisesUnknownModelErrorfor bad ids, ClassificationExperiment is sklearn-cloneable, declares classifier tag, ContextVar state. All 17 pass.
CHANGED#
CHANGED—pycaret/loggers/base_logger.pyis now a thin re-export shim overpycaret.logging.base.BaseLogger. The fullBaseLoggerlives inpycaret/logging/base.py. User subclasses ofpycaret.loggers.base_logger.BaseLogger(a 3.x import path) still work unchanged.CHANGED—pycaret/loggers/__init__.pyre-exports onlyBaseLoggerfrom the new location. Previously exported 5 symbols (all removed in session 1).
DOCS#
DOCS—docs/revamp/ARCHITECTURE.md— new design doc explaining the 4.0 engine architecture: why the 3.x layering was broken, the 8 core design principles (sklearn-canonicalBaseEstimator, typed results, build-on-not-replace Pipeline,ColumnTransformer-based preprocessor, canonical search CV, event-stream logger, no prints/interactive), the package layout, the fullExperimentinterface contract, task-subclass pattern, result/event contracts, multi-session migration plan, and explicit non-goals.DOCS— Release-notes section updated to reflect Phase 4 kickoff and the new package boundaries.
INTERNAL#
INTERNAL— Transition pattern documented. During the multi-session migration,Experiment._legacyholds an instance of the legacy_SupervisedExperiment(task-specific subclass picked by_build_legacy_experiment). Every verb on the newExperimentcalls through toself._legacy.<verb>(...), wraps the legacy return in a typed result dataclass, and emits structured events. Future sessions replace the delegation bodies one verb at a time without breaking the public API.INTERNAL— sklearn compatibility validated.ClassificationExperiment(...).get_params()returns all 15 init params;sklearn.base.clone(exp)preserves configuration and resets fitted state;__sklearn_tags__()surfacesestimator_type="classifier".
TESTS#
TESTS— 17 new unit tests added undertests/test_core_architecture.py. Fast (< 0.5s total); designed to run on every CI matrix entry.TESTS— No regressions on the legacy subset.pytest tests/test_models.py tests/test_datasets.py tests/test_core_architecture.py— 23/23 pass.TESTS— End-to-end proof via the new stack. Fullfit → compare_models → predict_modelgolden path runs green on Python 3.13 + sklearn 1.7.2 + NumPy 2.3.5 usingpycaret.tasks.ClassificationExperiment.