← Back to blog
2026-04-23

Session 4: Repo restructure + working notebooks + agent/dev docs + issue triage

Engineering log for session 4.

Baseline: end of session 3. Environment: unchanged.

Theme: user asked to "clear the folder, restructure for dev contributions, get rid of old stuff, one notebook per use-case fully working, MD files for agents, download all open issues, start cleaning them up." Session 4 does all of that in one pass.

REMOVED — dead weight purged#

  • REMOVEDDocker_files/ directory (old pycaret_full / pycaret_slim Docker image scaffolding) deleted.
  • REMOVEDdocs/source/ directory (Sphinx docs tree) deleted — we use GitBook for hosted docs now; Sphinx was unused.
  • REMOVEDdocs/Makefile, docs/make.bat, docs/make.sh, docs/logs.log — Sphinx build scaffolding, deleted.
  • REMOVEDtutorials/legacy_v3/ (the 6 archived 3.x notebooks from session 3) deleted.
  • REMOVEDtutorials/time_series/forecasting/ (old TS example scripts) deleted.
  • REMOVEDtutorials/translations/ (4 language-translated tutorial trees: chinese/greek/japanese/portuguese) deleted.
  • REMOVEDtutorials/pycaret_cheat-sheet_in_excel.xlsx deleted.
  • REMOVEDlogs.log (root-level 28K-line runtime log) deleted.
  • REMOVED.readthedocs.yml — ReadTheDocs config, unused since GitBook migration.
  • REMOVED.slugignore — Heroku-specific ignore file, long dead.

CHANGED — directory restructure#

  • CHANGED, BREAKINGtutorials/notebooks/. Modern naming; all 5 canonical notebooks live under notebooks/ now (01_classification.ipynb05_time_series.ipynb).
  • CHANGED.gitignore rewritten. Cleaner structure; artifacts/, *.log, .ruff_cache/, demo*.py, nbtest*.ipynb added; legacy 3.x scratch patterns retained as safety net.

ADDED — working notebooks (one per task)#

  • ADDEDscripts/build_notebooks.py — programmatic notebook generator using nbformat + nbclient. Run with --run to execute and capture fresh outputs.
  • ADDEDnotebooks/01_classification.ipynb (43 KB with outputs). Canonical ClassificationExperimentfitcompare_modelstune_modelpredict_modelsave/load_model → inspect event stream.
  • ADDEDnotebooks/02_regression.ipynb (25 KB).
  • ADDEDnotebooks/03_clustering.ipynb (18 KB). ClusteringExperimentcreate_model("kmeans")assign_model.
  • ADDEDnotebooks/04_anomaly_detection.ipynb (18 KB). AnomalyExperimentcreate_model("iforest")assign_model.
  • ADDEDnotebooks/05_time_series.ipynb (13 KB). TimeSeriesExperiment(fh=12)compare_modelspredict_model.
  • ADDEDAll 5 notebooks executed end-to-end under Python 3.13 + sklearn 1.7.2. Outputs are persisted in the .ipynb JSON so GitHub can render them and users can skim without running.

ADDED — AGENTS.md at repo root#

  • ADDED/AGENTS.md — 60-second briefing for AI coding agents: TL;DR, read-first list, non-negotiables (8 rules), conventions, workflow, repo map, common-task recipes. This is the file any agent should consume before touching the repo.

ADDED — docs/for_agents/ deep dives (5 files)#

  • ADDEDdocs/for_agents/README.md — index of agent-facing docs.
  • ADDEDdocs/for_agents/ENGINE_WALKTHROUGH.md — step-by-step of what happens inside fit → compare_models → predict_model, including the legacy-delegation pattern.
  • ADDEDdocs/for_agents/TYPED_RESULTS.md — every result dataclass (9 of them) with full field listing and usage idioms.
  • ADDEDdocs/for_agents/EVENT_STREAM.md — all 22 EventKinds tabulated with their typical payloads; subscriber contract + custom-logger recipe.
  • ADDEDdocs/for_agents/INTROSPECTION_API.md — the pycaret.api surface as a UI/agent integration contract, with React-form and LLM-prompt examples.
  • ADDEDdocs/for_agents/TASK_CHEATSHEET.md — verb × task matrix + constructor-parameter matrix on one page; import-path reference.

ADDED — docs/for_developers/ onboarding (5 files)#

  • ADDEDdocs/for_developers/README.md — index.
  • ADDEDdocs/for_developers/SETUP.md — clone → uv sync --all-extras → first green test in < 5 min; Python-version matrix; Windows notes; IDE setup.
  • ADDEDdocs/for_developers/TESTING.md — test layout, how to run/write tests, markers, CI matrix, coverage, what not to write.
  • ADDEDdocs/for_developers/DRAINING_THE_GODCLASS.md — verb-migration playbook for Phase 5: recommended order, before/after recipe, hard constraints, completion criteria.
  • ADDEDdocs/for_developers/CODING_STYLE.md — formatting, imports, type hints, docstrings, naming, dataclass conventions, error handling, logging, dependency discipline, git hygiene.
  • ADDEDdocs/for_developers/RELEASE_PROCESS.md — versioning, tagging, notes generation, PyPI publish.

CHANGED — CONTRIBUTING.md rewritten for 4.0#

  • CHANGEDCONTRIBUTING.md — replaced 3.x-era content (black/isort mention, sphinx docs, pip install -e .[test]) with a 4.0-flavored guide that links into AGENTS.md + docs/for_agents/ + docs/for_developers/. Includes the current PR checklist.

ADDED — GitHub issue snapshot + triage#

  • ADDEDdocs/revamp/github_issues/open_issues_raw.json — raw snapshot of all 388 open issues on pycaret/pycaret (fetched via gh issue list).
  • ADDEDscripts/triage_issues.py — classifier that reads the raw snapshot and assigns each issue to one of 5 buckets: fixed_in_4_0, out_of_scope, stale, still_relevant_bug, still_relevant_enhancement. Heuristic-based; reruns are idempotent.
  • ADDEDdocs/revamp/github_issues/triage.json — machine-readable output of the classifier.
  • ADDEDdocs/revamp/github_issues/triage.md — human-browsable triage report, one table per bucket with issue number, title, labels, update date, classification reason.
  • ADDEDdocs/revamp/github_issues/README.md — triage summary + methodology.
  • ADDEDdocs/revamp/github_issues/PLAYBOOK.md — step-by-step actions per bucket with ready-to-paste reply templates.

Headline triage result#

From 388 open issues:

  • 8 (2%) fixed_in_4_0 — close with 4.0 release-notes pointer.
  • 92 (24%) out_of_scope — close with KILL_LIST.md pointer.
  • 123 (32%) stale — auto-ping, close after 30 days of silence.
  • 58 (15%) still_relevant_bug — label 4.0-candidate, route to Phase 5.
  • 107 (28%) still_relevant_enhancement — per-item decision.

224 of 388 (58%) can be closed or auto-pinged without per-issue human judgment.

FIXED#

  • FIXEDnp.productnp.prod in pycaret/internal/patches/sklearn.py:106. NumPy 2 removed the capitalised alias; the call site surfaced when the notebook-execution pass tried to run compare_models. Was blocking the full e2e notebook run.

INTERNAL#

  • INTERNALscripts/ directory created for maintenance scripts. Two scripts landed this session; future additions (bulk issue close, LOC reports, release build helpers) go here.
  • INTERNALtests/conftest.py, tests/test_models.py, and all other tests untouched in this session (no code-under-test changes).

Session 4 delta summary#

MetricSession 3 endSession 4 endΔ
Repo top-level directory count910+1 (scripts/)
Source LOC in pycaret/~49,400~49,4000
Test files440
Test pass rate100% (32/32)100% (32/32)
Dead-weight directories4 (Docker_files, docs/source, tutorials/{legacy_v3,time_series,translations})0
Stale root-level files3 (logs.log, .readthedocs.yml, .slugignore)0
Working notebooks0 (3.x legacy archived)5 (executed, with outputs)+5
Docs files7 in docs/revamp/23 (revamp + for_agents + for_developers + github_issues)+16
GitHub issues in actionable triage buckets0 (untouched backlog of 388)224 ready-to-close, 165 needing human review