← Back to blog
2026-05-16

Session 57: Phase 0: publishable foundation (secrets lockdown + docker compose polish + README + vision doc)

Engineering log for session 57.

Maintainer goal: get the repo into a state where git clone && docker compose up works for a brand-new contributor, with categorical guarantees that no secrets can leak via git. Phase 0 of the long roadmap toward a Databricks/Snowflake-style self-hosted ML platform.

ADDED — Secrets-leak prevention (categorical, layered)#

  • ADDED.gitignore broadened to cover ALL local DB / credential / env patterns, not just the specific filenames previously listed. New entries: *.db, *.db-journal, *.db-wal, *.db-shm, *.sqlite* (anywhere in tree), .env + .env.* (with .env.example allow-listed), *.pem / *.key / credentials.json / service-account*.json / aws-credentials*, plus the /data/ docker-compose volume mount point. Previously only /pycaret.db and /services/api/pycaret.db were covered, which left pycaret-dev.db, smoke.db, and test_phase_smoke.db exposed to a careless git add -A.
  • ADDEDscripts/check-secrets.sh — pre-push gate (works manually too). Scans for Anthropic / OpenAI / Stripe / Slack / GitHub / AWS / Google API key shapes, Fernet ciphertext blobs (≥40-char base64 after ENC:v1:), and PEM private key blocks. Whole-file allow-list at scripts/.secrets-allowlist; single-line # pragma: allow-secret. Hook installation: cp scripts/check-secrets.sh .git/hooks/pre-push && chmod +x .git/hooks/pre-push. Validated clean on the current 527-file tree.
  • ADDED.env.example at repo root — documented template for every overridable env var. Commented-out by default so contributors know what's available without polluting their actual .env.

ADDED — docker compose up from repo root#

  • ADDEDcompose.yml at repo root. Single-file 2-service stack (api + web), named volume pycaret-data for SQLite + artifacts + Fernet key persistence. Every env var defaults to a sane local-dev value via ${VAR:-default} substitution, all overridable via root .env. Healthchecks wired; web waits for api: service_healthy before starting.
  • ADDEDinfra/docker/docker-entrypoint.sh for the api container. On first run: generates a Fernet key + persists to /data/.secrets/fernet.key (chmod 600). On subsequent runs: reads from the same file. The Fernet key thus survives container restarts inside the data volume, fixing the silent "secrets become unreadable after restart" trap we hit live in session 55. Hand-off to uvicorn via exec so docker stop signals reach the server cleanly.
  • CHANGEDinfra/docker/Dockerfile.api now copies the entrypoint script and sets it as ENTRYPOINT. The actual server command moves to CMD (uvicorn invocation unchanged).
  • REMOVEDinfra/docker/docker-compose.yml deleted. Two-source-of-truth was confusing; the root compose.yml is canonical now. The prod variant at infra/docker/docker-compose.prod.yml stays (it's a different document — Postgres + S3 + replicas).

ADDED — Public-facing docs#

  • ADDEDdocs/revamp/PLATFORM_ARCHITECTURE.md — vision document. Captures the pluggable backends foundation pattern: every external dep (storage / DB / secrets / auth / queue / compute / notifier) sits behind a Protocol with local + cloud impls; choice is config, never if AWS: branches. Includes: the 7-slot matrix with current state, the proposed Backends container dataclass, AWS Terraform target architecture diagram, 5-phase rollout, explicit no-list. Anchor for every future session — "before you hardcode an external dep, read this."
  • CHANGEDREADME.md rewritten as the publishable face of the repo. Five-minute docker compose up quickstart, golden-path tour (Train → Register → Deploy → Predict), local-dev-without-docker instructions, security section explaining check-secrets.sh, troubleshooting table, contributor flow. The architecture sections were softened to honestly describe what docker compose up ACTUALLY runs today (single uvicorn process holding API + scheduler + compute + SQLite file inside the api container, same shape as Plausible / Vaultwarden / n8n self-hosters) versus the target enterprise-scale split (separate api / worker / runtime / RDS / S3 / SQS) that lives in PLATFORM_ARCHITECTURE.md as Phase 1-3 work still ahead. Don't over-promise; ship what we have, grow in the open.

ADDED — .gitattributes for cross-platform line endings#

  • ADDED.gitattributes at repo root. Forces LF line endings on *.sh, Dockerfile*, *.yml, *.py, etc. in the index, regardless of the contributor's OS. Prevents the classic Windows-cloned-repo failure where git auto-converts docker-entrypoint.sh to CRLF and the Linux container then errors with /bin/sh^M: bad interpreter. Validated locally: file scripts/check-secrets.sh reports Unicode text, UTF-8 text executable (no CRLF marker).

TESTS — Validation done before claiming done#

  • TESTSbash scripts/check-secrets.sh — clean across all 527 tracked files (initial run flagged 4 false positives on doc-mentions of the literal string ENC:v1:, fixed by requiring ≥40 chars of base64 after the prefix in the pattern).
  • TESTSdocker compose config — validates without error. CORS env var renders as a properly-quoted JSON string after fixing the ${...:-default} quoting trap.
  • TESTS — Not validated yet: actual docker compose up --build from clean. The maintainer will run that as the smoke test in their own environment per the README quickstart.