2026-04-23
Session 4: Repo restructure + working notebooks + agent/dev docs + issue triage
Engineering log for session 4.
Baseline: end of session 3. Environment: unchanged.
Theme: user asked to "clear the folder, restructure for dev contributions, get rid of old stuff, one notebook per use-case fully working, MD files for agents, download all open issues, start cleaning them up." Session 4 does all of that in one pass.
REMOVED — dead weight purged#
REMOVED—Docker_files/directory (oldpycaret_full/pycaret_slimDocker image scaffolding) deleted.REMOVED—docs/source/directory (Sphinx docs tree) deleted — we use GitBook for hosted docs now; Sphinx was unused.REMOVED—docs/Makefile,docs/make.bat,docs/make.sh,docs/logs.log— Sphinx build scaffolding, deleted.REMOVED—tutorials/legacy_v3/(the 6 archived 3.x notebooks from session 3) deleted.REMOVED—tutorials/time_series/forecasting/(old TS example scripts) deleted.REMOVED—tutorials/translations/(4 language-translated tutorial trees: chinese/greek/japanese/portuguese) deleted.REMOVED—tutorials/pycaret_cheat-sheet_in_excel.xlsxdeleted.REMOVED—logs.log(root-level 28K-line runtime log) deleted.REMOVED—.readthedocs.yml— ReadTheDocs config, unused since GitBook migration.REMOVED—.slugignore— Heroku-specific ignore file, long dead.
CHANGED — directory restructure#
CHANGED, BREAKING—tutorials/→notebooks/. Modern naming; all 5 canonical notebooks live undernotebooks/now (01_classification.ipynb…05_time_series.ipynb).CHANGED—.gitignorerewritten. Cleaner structure;artifacts/,*.log,.ruff_cache/,demo*.py,nbtest*.ipynbadded; legacy 3.x scratch patterns retained as safety net.
ADDED — working notebooks (one per task)#
ADDED—scripts/build_notebooks.py— programmatic notebook generator usingnbformat+nbclient. Run with--runto execute and capture fresh outputs.ADDED—notebooks/01_classification.ipynb(43 KB with outputs). CanonicalClassificationExperiment→fit→compare_models→tune_model→predict_model→save/load_model→ inspect event stream.ADDED—notebooks/02_regression.ipynb(25 KB).ADDED—notebooks/03_clustering.ipynb(18 KB).ClusteringExperiment→create_model("kmeans")→assign_model.ADDED—notebooks/04_anomaly_detection.ipynb(18 KB).AnomalyExperiment→create_model("iforest")→assign_model.ADDED—notebooks/05_time_series.ipynb(13 KB).TimeSeriesExperiment(fh=12)→compare_models→predict_model.ADDED— All 5 notebooks executed end-to-end under Python 3.13 + sklearn 1.7.2. Outputs are persisted in the.ipynbJSON so GitHub can render them and users can skim without running.
ADDED — AGENTS.md at repo root#
ADDED—/AGENTS.md— 60-second briefing for AI coding agents: TL;DR, read-first list, non-negotiables (8 rules), conventions, workflow, repo map, common-task recipes. This is the file any agent should consume before touching the repo.
ADDED — docs/for_agents/ deep dives (5 files)#
ADDED—docs/for_agents/README.md— index of agent-facing docs.ADDED—docs/for_agents/ENGINE_WALKTHROUGH.md— step-by-step of what happens insidefit → compare_models → predict_model, including the legacy-delegation pattern.ADDED—docs/for_agents/TYPED_RESULTS.md— every result dataclass (9 of them) with full field listing and usage idioms.ADDED—docs/for_agents/EVENT_STREAM.md— all 22EventKinds tabulated with their typical payloads; subscriber contract + custom-logger recipe.ADDED—docs/for_agents/INTROSPECTION_API.md— thepycaret.apisurface as a UI/agent integration contract, with React-form and LLM-prompt examples.ADDED—docs/for_agents/TASK_CHEATSHEET.md— verb × task matrix + constructor-parameter matrix on one page; import-path reference.
ADDED — docs/for_developers/ onboarding (5 files)#
ADDED—docs/for_developers/README.md— index.ADDED—docs/for_developers/SETUP.md— clone →uv sync --all-extras→ first green test in < 5 min; Python-version matrix; Windows notes; IDE setup.ADDED—docs/for_developers/TESTING.md— test layout, how to run/write tests, markers, CI matrix, coverage, what not to write.ADDED—docs/for_developers/DRAINING_THE_GODCLASS.md— verb-migration playbook for Phase 5: recommended order, before/after recipe, hard constraints, completion criteria.ADDED—docs/for_developers/CODING_STYLE.md— formatting, imports, type hints, docstrings, naming, dataclass conventions, error handling, logging, dependency discipline, git hygiene.ADDED—docs/for_developers/RELEASE_PROCESS.md— versioning, tagging, notes generation, PyPI publish.
CHANGED — CONTRIBUTING.md rewritten for 4.0#
CHANGED—CONTRIBUTING.md— replaced 3.x-era content (black/isort mention, sphinx docs,pip install -e .[test]) with a 4.0-flavored guide that links intoAGENTS.md+docs/for_agents/+docs/for_developers/. Includes the current PR checklist.
ADDED — GitHub issue snapshot + triage#
ADDED—docs/revamp/github_issues/open_issues_raw.json— raw snapshot of all 388 open issues onpycaret/pycaret(fetched viagh issue list).ADDED—scripts/triage_issues.py— classifier that reads the raw snapshot and assigns each issue to one of 5 buckets:fixed_in_4_0,out_of_scope,stale,still_relevant_bug,still_relevant_enhancement. Heuristic-based; reruns are idempotent.ADDED—docs/revamp/github_issues/triage.json— machine-readable output of the classifier.ADDED—docs/revamp/github_issues/triage.md— human-browsable triage report, one table per bucket with issue number, title, labels, update date, classification reason.ADDED—docs/revamp/github_issues/README.md— triage summary + methodology.ADDED—docs/revamp/github_issues/PLAYBOOK.md— step-by-step actions per bucket with ready-to-paste reply templates.
Headline triage result#
From 388 open issues:
- 8 (2%)
fixed_in_4_0— close with 4.0 release-notes pointer. - 92 (24%)
out_of_scope— close withKILL_LIST.mdpointer. - 123 (32%)
stale— auto-ping, close after 30 days of silence. - 58 (15%)
still_relevant_bug— label4.0-candidate, route to Phase 5. - 107 (28%)
still_relevant_enhancement— per-item decision.
224 of 388 (58%) can be closed or auto-pinged without per-issue human judgment.
FIXED#
FIXED—np.product→np.prodinpycaret/internal/patches/sklearn.py:106. NumPy 2 removed the capitalised alias; the call site surfaced when the notebook-execution pass tried to runcompare_models. Was blocking the full e2e notebook run.
INTERNAL#
INTERNAL—scripts/directory created for maintenance scripts. Two scripts landed this session; future additions (bulk issue close, LOC reports, release build helpers) go here.INTERNAL—tests/conftest.py,tests/test_models.py, and all other tests untouched in this session (no code-under-test changes).
Session 4 delta summary#
| Metric | Session 3 end | Session 4 end | Δ |
|---|---|---|---|
| Repo top-level directory count | 9 | 10 | +1 (scripts/) |
Source LOC in pycaret/ | ~49,400 | ~49,400 | 0 |
| Test files | 4 | 4 | 0 |
| Test pass rate | 100% (32/32) | 100% (32/32) | — |
| Dead-weight directories | 4 (Docker_files, docs/source, tutorials/{legacy_v3,time_series,translations}) | 0 | − |
| Stale root-level files | 3 (logs.log, .readthedocs.yml, .slugignore) | 0 | − |
| Working notebooks | 0 (3.x legacy archived) | 5 (executed, with outputs) | +5 |
| Docs files | 7 in docs/revamp/ | 23 (revamp + for_agents + for_developers + github_issues) | +16 |
| GitHub issues in actionable triage buckets | 0 (untouched backlog of 388) | 224 ready-to-close, 165 needing human review | — |