Folding Anthropic's large-codebase playbook into Trellis

A point-by-point gap analysis against the latest Claude Code best-practices guide, and the ten changes I shipped to close the gaps.

Anthropic just published How Claude Code works in large codebases. It is the most direct guide I have seen them write about how to actually wire Claude Code up at the project level — CLAUDE.md shape, hooks, skills, MCP, LSP, governance, the whole stack.

Trellis is the regime I run my six personal projects under. One canonical control plane, project-level inheritance through symlinks, three tiers of hooks, ten weekly audits. I wrote about it last week. The Anthropic post landed seven days later and I read it the way you read a competitor's deck — looking for what they thought was load-bearing that I had missed.

This post is the gap analysis and the resulting PR. If you run Claude Code at any scale, the practical question is the same one I asked: what did the people who built Claude Code think mattered, and how much of it does your setup already do?

For Trellis the score is "most of it." The places it does not are interesting, and they are what shipped today as v0.3.0.

The map: what Anthropic said, what Trellis already did, what was missing

I read the post once for argument, twice for receipts. Every recommendation that named a file path, an env var, a config key, or a workflow shape got pulled into a table next to its Trellis counterpart.

What Trellis already had:

Layered CLAUDE.md — root core-rules/CLAUDE.md plus per-project files, both inherited through symlinks. Target size for each is under 5 KB, which matches the guide's "pointers and gotchas only" framing.
Hooks for self-improving setup — eight canonical hooks across two harnesses, six events. Stop hooks gate Definition of Done; pre-push catches direct-to-main; PreToolUse blocks destructive commands.
Skills loaded on demand — seven canonical (process-gate, security-gate, the spec-kit pipeline of clarify → spec → plan → tasks → analyze).
Subagent dispatch — core-rules/CLAUDE.md mandates parallel sub-agents for independent units, and the spec-kit pipeline is a sequential split with checkpoints.
Code-review parity — every edit-heavy turn dispatches a code-review subagent against the diff. Findings are advisory but visible.
Governance — a registry, a blacklist, a deferred list for n=1 rule candidates, version pins, weekly audits that scan for drift. One DRI (me), explicit by construction.
Self-improving rules — gotchas.md per project, plus a monthly gotchas-rollup audit that clusters across the fleet and promotes patterns hitting n=3 into parent rules.

What Trellis did not have, ranked by impact:

Token-noise filter. The Anthropic post is unambiguous: version-control a permissions.deny block that excludes generated files, build artifacts, third-party code. Trellis's canonical claude-settings.json shipped no permissions block at all. Every project's agent could read node_modules, .next, dist, vendor, lockfiles, every cache directory. Wasted context on every session.
Per-subdirectory scoping for test and lint. The guide specifically calls this out for service-oriented codebases. Trellis's stop-verify.sh ran every check at repo root regardless of where the agent edited. On clusterbid-console — five Go services, a Next frontend, a Python tree — a single-file edit would type-check and lint the entire monorepo and routinely blow past the 90-second budget.
Codebase map. The post recommends "one-line descriptions of top-level folders as a table of contents" for big repos. Trellis project CLAUDE.md templates seeded no such section. With nine top-level dirs in some projects, the agent re-discovered the tree on every fresh session.
Configuration aging. "Review CLAUDE.md every 3-6 months as models evolve." Trellis had version-drift, dep-currency, bypass-tripwire, parent-hook-drift, cross-project-process-audit. None of them asked the question "is this rule obsolete now?"
LSP / symbol-level navigation. Zero integration anywhere in Trellis. Polyglot projects pay this cost most.
Skill path-scoping. Every Trellis skill loads globally. Fine for workflow skills (process-gate, security-gate). Wrong shape for stack-specific helpers a polyglot monorepo might want.
The read-only explorer pattern. "Read-only subagent maps subsystem, writes findings to file. Main agent edits with full picture." Trellis had /primer for stable subsystems, but no quick ephemeral counterpart for one-shot exploration.
Stop hook proposing rules. The guide explicitly suggests Stop hooks that propose CLAUDE.md updates while the session is fresh. Trellis's gotchas.md mechanism relied on the agent volunteering entries — no hook actually scanned the just-finished session for patterns worth keeping.
MCP allowlist. Solo-DRI projects do not need a governance committee, but they do benefit from an explicit list of which MCP servers are sanctioned. The Anthropic post calls this out; Trellis had nothing in the schema.
Plugin marketplace. Already on Trellis's roadmap; not in scope for this PR.

The work that ships today closes the first nine. Plugin marketplace stays parked.

Three phases, one PR

The Trellis PR-size hard cap is 800 lines. The bundle was 852. Splitting into three PRs is exactly the kind of false-economy split the cap is meant to prevent — phase 1B (the codebase-map convention) is the rule that phase 2F (the audit) enforces; phase 3G, 3I, and 3J all share the same gitignore-fragment sentinel; the core-rules/VERSION bump is per release, not per change. Each split PR would land in a state that fails a different audit until the next one landed on top of it.

So the PR carries a single-page ADR explaining the carve-out, and ships as one.

Phase 1 — quick wins

permissions.deny baseline. The canonical settings template now ships an explicit deny list covering node_modules, .next, dist, build, out, target, vendor, .venv, __pycache__, every cache directory, and every lockfile I could name. New projects pick it up through onboard-project.sh. Existing projects converge through a new scripts/rollout-settings.sh that jq-unions canonical + project-local entries and preserves anything the project added by hand. Idempotent. The single cheapest change in the bundle, and the one with the most token-saving leverage.

Mandatory codebase map. Project CLAUDE.md now requires a ## Codebase map section when the project has five or more top-level directories. Below that threshold the section is noise — the agent can hold them in head from one ls. Above it, the cost of re-discovery on every fresh session adds up enough to justify five lines of upfront context. The format is bare: one line per directory, role only. If a directory needs more, write a sibling docs/<dir>.md and link to it from the map line.

obsolete-rules audit. A new quarterly audit that walks the entire rules corpus — parent CLAUDE.md, presets, engineering-process.md, every project CLAUDE.md and gotchas.md — and classifies each rule as load-bearing, model-compensating, harness-compensating, stylistic, or process. Rules in the first, fourth, and fifth bucket stay. The middle two are candidates. A strong removal proposal needs all of: model- or harness-compensating, no gotchas.md evidence in the last six months, written before the current major model generation, and no enforcement hook still depending on it. Weak proposals carry the inconclusive signals.

This is the only Trellis audit that proposes removals rather than additions. The bar is set high on purpose. A wrong removal silently degrades discipline; a missed removal just wastes a few tokens. The first run after a major model launch will be the highest-yield, because that is when pre-launch workarounds become candidates.

Phase 2 — monorepo scoping

Subtree-aware stop-verify. When every changed file in a Stop turn sits under one subdirectory that carries its own manifest — package.json, go.mod, pyproject.toml, Cargo.toml — the hook now cds into that subtree before typecheck, lint, and test. Mixed-subtree changes fall back to repo root. Escape hatch: PROCESS_GATE_FORCE_ROOT=1 runs at root unconditionally.

The detection is conservative on purpose. Requires every changed file to share one subtree, and that subtree must already carry a manifest. Single-language repos behave identically to before. The win shows up on polyglot monorepos: clusterbid-console's typecheck-on-Stop dropped from "run every Go module + Next + Python" to "run only the affected subtree" without me having to write a single new validator. Patched in both the Claude Code and Codex parity copies of the script.

Optional skill scope. A new scope.json convention next to a project-local SKILL.md lets the project declare which paths the skill auto-fires under. Schema is tiny:

{
  "paths": ["services/**", "pkg/**"],
  "reason": "Go-only validators; would noise the Next.js subtree"
}

Canonical skills stay global — every canonical Trellis skill is workflow-shaped, not stack-shaped. The scoping is for project-local skills that a polyglot monorepo might want to keep silent on the wrong subtree. Explicit /skill <name> invocations always work regardless of scope; only auto-mention is gated.

This one is agent-followed, not engine-enforced. Claude Code does not natively read scope.json. Every project loads core-rules/CLAUDE.md, which now directs the agent to respect the file when present. If n=2 projects find the convention load-bearing, the next move is a parent-hook-drift audit row that validates scope.json schema across the fleet.

Codebase-map enforcement. The weekly cross-project-process-audit got a new check. Projects with five or more top-level directories must carry a ## Codebase map heading. Three failure modes classified separately: missing heading (warning), heading present but stub-only (info), heading references directories that no longer exist (warning). Sub-threshold projects skip the check entirely. The rule existed on paper after phase 1B; this commit makes it mechanically enforced.

Phase 3 — subagent, LSP, governance

/explore slash command. Trellis already had a /primer system for capturing durable context. The Anthropic post calls out a different shape: a just-in-time, transient exploration that the editing session loads once and discards. So /explore is the ephemeral counterpart. Dispatches a read-only subagent to map the subsystem, writes the result to <canonical-root>/.claude/primers/_explore/<slug>-<sha>.md, and the editing session reads it before touching code. Post-edit, the user decides whether to promote to /primer or delete.

The directory shape reuses the primer infrastructure deliberately. One canonical-root resolution path. One set of conventions. One audit cycle to keep both honest. The _explore/ subdir is not loaded at session start — only durable primers in INDEX.md are — so leftover notes are harmless but accumulate. Periodic cleanup is the user's call.

LSP recommendation. A new section in engineering-process.md §8.3 recommends installing per-language LSP binaries (gopls, typescript-language-server, pyright, rust-analyzer) plus a Claude Code LSP plugin for repos spanning two or more languages. Recommendation only, no audit. Single-language repos derive less value. The win is on polyglot monorepos where the same identifier exists in three languages and ripgrep returns forty false positives but symbol lookup returns the right one.

propose-rules Stop hook. Experimental, opt-in. Activated by setting PROCESS_GATE_PROPOSE_RULES=1 in the project's hook config. Three layers of cost control before any subagent fires:

Opt-in gate. Unset means silent exit. Every project gets the file through sync-hooks, but nobody pays tokens unless they flipped the switch.
Pure-chat guard. No file edits this turn means nothing to learn from.
Transcript-tail heuristic. Subagent only fires when the last two hundred transcript lines contain at least one explicit-correction signal — "no", "don't", "actually", "stop doing", "that's wrong", "never do". If the user did not correct anything, the hook spends nothing.

When it does fire, a one-shot claude -p --max-turns 1 subagent reads the transcript tail and the project's gotchas.md and proposes a single candidate entry, or emits the literal string NONE. The proposal returns as additionalContext. The hook never blocks. The decision to append is the user's.

This pairs with the existing monthly gotchas-rollup. propose-rules surfaces n=1 candidates per turn; gotchas-rollup clusters n=3 candidates into parent-rule promotions. Different time horizons, same self-improvement loop.

approved_mcps schema field. Optional, documentation-only today. The trellis config schema accepts an array of { name, purpose, scope: "fleet" | "per-project" } entries. Solo-DRI projects do not need a governance committee, but the explicit allowlist makes the fleet's sanctioned MCP set readable in one place. The example config seeds the three I actually wire: scheduled-tasks for the audits, computer-use for native-app verification, claude-in-chrome for browser flows. A parked mcp-drift audit will pick this up when n=2 projects diverge in what they connect to.

What I noticed along the way

A few things that did not become commits but are worth saying out loud.

The Anthropic post is more prescriptive than I expected. Not in a bad way. "Review CLAUDE.md every 3-6 months" is a cadence I would not have set on my own, and the framing of "rules that compensated for limitations the model no longer has" is the most useful new lens I have picked up in months. Trellis grew a quarterly audit specifically because of that one sentence.

Most of the recommendations Trellis already covered were ones I had reverse-engineered from incidents. Force-push to main happened once; pre-push guard happened the next day. Agent summarised work it did not do; receipts-required entered the parent rules. The Anthropic post lists those as best practices because Anthropic has watched many teams hit the same incidents — they are not novel insights, they are the consensus shape that emerges from enough operational pain across enough teams. Trellis hit them faster because it is a six-project regime, not a one-project regime. Every project is one Rule-of-Three candidate.

What was novel: permissions.deny. I had not thought of generated-directory exclusion as a Trellis concern — it felt like project hygiene. The post correctly frames it as a token-noise filter that belongs in the canonical layer because every project has the same node_modules problem.

And the subtree-scoping change in stop-verify was the largest single quality-of-life upgrade in months. Going from "every Stop turn runs every check across every language in the monorepo" to "runs the right scope automatically" is the difference between Trellis being useful on polyglot repos and Trellis being a tax on them.

The PR

github.com/Zireael26/trellis-instance/pull/63 — twelve commits, one ADR, core-rules/VERSION 0.2.0 → 0.3.0. Each phase is its own commit so the bundle is reviewable in passes. The ADR at docs/adr/2026-05-18-anthropic-best-practices-bundled-rollout.md carries the size carve-out and explains why splitting harms clarity.

The public template repo will follow on the next sync-to-template.sh run. Downstream consumers see the new version via scripts/upgrade.sh once the canonical clone tags v0.3.0.

Re-reading the Anthropic post and asking what Trellis still does not do is now the same exercise as the quarterly obsolete-rules audit, just in the other direction — additions instead of removals. The next major Claude or Codex release is when the audit earns its keep. I am looking forward to running it.