Trellis: An Engineering Process for AI Coding Agents

How I run six projects with one rulebook, nine hooks, and a fleet of weekly audits — and why I open-sourced it.

The problem nobody warns you about

The first time a coding agent writes production code for you it feels like a superpower. The tenth time, you start seeing the pattern in the corners it cuts: pushed straight to main, marked a task done while the tests never ran, summarised what it meant to do instead of what it did, force-pushed over an hour of your work to "clean things up."

These are not model bugs. They are gaps in the process around the model — the gaps human teams closed over decades with branch protection, code review, commit conventions, CI, and a hundred smaller rituals. An agent inherits none of that by default. You have to build the scaffolding yourself.

I have spent the last three months doing exactly that, in the open, across six of my projects. The result is Trellis: a parent/child engineering-process regime for Claude Code and Codex. One source of truth for cross-cutting rules, nine canonical hooks that make those rules mechanically enforceable, and a fleet of scheduled audits that keeps every project honest week after week.

The name is the metaphor. A trellis is the frame a climbing plant grows on. Without one the plant sprawls and breaks; with one it grows tall and clean. Same idea, applied to code an agent writes.

The public, forkable template: github.com/Zireael26/trellis.

The shape of the system

Trellis architecture — parent control plane, project inheritance, three-tier hook enforcement, and the scheduled audit fleet

At the top is a single control plane repo. It owns the parent rules (core-rules/CLAUDE.md), the hook specifications (hooks.md), the inheritance contract (inheritance.md), the registry of active projects (registry.md), and a deferred.md for rules that haven't earned their place yet.

Every project in the registry inherits from the control plane through symlinks. .claude/rules/trellis.md for Claude Code, .agents/rules/trellis.md for Codex, both pointing at the same canonical file. Projects only carry what's actually project-specific: their own CLAUDE.md, a gotchas.md, and a context-log.md the hooks maintain automatically across sessions.

Inside each project sits a three-tier hook stack. Tier one runs on every agent turn with a budget of three seconds. Tier two runs on Stop with ninety seconds, and that is where "done" gets verified. Tier three is the git boundary — pre-commit, commit-msg, pre-push — the backstop if the agent slipped past everything else.

Every week, ten scheduled audits sweep the whole registry and write timestamped reports back into the control plane. Drift, bypasses, vulnerabilities, test rot, dependency posture — everything surfaces in dated markdown I can grep, diff, and quote in a commit message.

The rest of this post is what's load-bearing about that arrangement.

Principles I keep coming back to

A few tests that kept me honest whenever I was tempted to add another rule or a clever new automation.

Parent/child layers, Rule of Three for promotion. Cross-cutting rules belong in the parent layer. Project-specific rules belong in the project. A pattern earns parent-status only after a third independent project adopts it. n=2 is the danger zone: enough repetition to feel like a pattern, not enough to rule out coincidence. Candidates wait in core-rules/deferred.md until they get their third witness or get demoted entirely.

Process is code. Every rule that can be mechanically enforced becomes a hook, and hooks fail closed. Rules that depend on good intentions erode in under a month. Rules backed by a hook persist until the hook is deleted. If I can't write a hook for a rule, I question whether it should be one.

Receipts over self-reporting. "Done" means the agent attaches the verification command, the exit code, and the diff lines that prove the change. "It works" without receipts isn't done. stop-verify enforces the checks; the presentation discipline is on the agent.

Small surface, deep discipline. The parent layer is intentionally small — under 5 KB for the rules file, nine hooks, ten audits. Anything broader belongs in a project. Keeps the contract legible to humans and LLMs both, and keeps the drift surface narrow enough to actually watch.

Harness-safe by default. Everything works in Claude Code and Codex. The two harnesses use different hook envelopes, so canonical implementations live separately under core-rules/hooks/ and core-rules/codex/. The policy intent is identical: same rule, same enforcement, whichever agent is driving.

What the hooks actually do

Nine canonical hooks split across three tiers. The split is about budget, not capability.

The fast-local tier (every turn, sub-second budget) stops the catastrophic stuff before it ships. block-destructive denies rm -rf aimed at the home directory, git push --force on any branch, git reset --hard against HEAD~N, unscoped DELETE FROM SQL, and any read of a .env* file. post-edit-verify lints whatever the agent just wrote, scoped to the touched file so the budget stays under three seconds. session-context injects the previous session's state at startup: current branch, last five commits, dirty files, outstanding gotchas, the prior context-log.md. The agent doesn't have to spelunk to find where it left off. save-context-log and post-compact-context close the loop on the way out.

The heavy-gated tier (on Stop, ≤90s budget) is where "done" gets adjudicated. stop-verify enforces the receipts rule. It scans the TodoWrite list and blocks if anything is in_progress or pending, then runs typecheck → lint → fast tests across the whole repo. On failure the agent gets a sliced error (first 30 lines for compile errors, last 30 for test output) and is told to fix it before claiming completion. code-review-subagent dispatches a read-only review agent against the current diff on edit-heavy turns (≥3 files or ≥200 lines). ui-verify boots the dev server, screenshots the affected route, and blocks if a UI-visible change shipped without that proof.

The git-boundary tier is the durable backstop. Husky's pre-commit runs lint-staged on staged files. commit-msg enforces Conventional Commits with a project-configured scope allowlist. pre-push blocks direct pushes to main across every registered project and reruns the typecheck/lint/test triad. There is an emergency override (TRELLIS_ALLOW_MAIN_PUSH=1); every invocation has to be documented in a commit trailer or gotchas.md. For projects without a package.json — Unity, Rust, Go, Python-only — the same enforcement lands in native git hooks under .githooks/.

Three tiers because each one covers a different failure mode. Fast-local catches mistakes mid-turn while context is still warm. Heavy-gated catches "claims done but isn't" at session boundaries. Git-boundary catches anything that happened outside a Claude turn entirely — manual amends, rebases, force-pushes from another terminal.

The process-gate skill

Hooks fire automatically. Sometimes I want to ask the agent explicitly — is this branch mergeable yet? — and have it answer in a format I trust.

That's the process-gate skill. Canonical implementation under core-rules/skills/process-gate/, symlinked into every project's .claude/skills/ and .agents/skills/. On invocation it runs six gates — PR hygiene, secrets, bypass markers, tests and coverage, docs discipline, and any stack-specific validators the project declares — and emits a fixed-shape verdict block:

PR hygiene:        ✅ pass
Secrets:           ✅ pass
Bypass markers:    ❌ fail
Tests & coverage:  ⚠️ warn
Docs discipline:   ✅ pass
Stack profile:     ✅ pass

Overall: BLOCKED

One ❌ blocks the merge. Warnings can be accepted but the acceptance has to be recorded in the PR description. The skill reads its references at invocation time, so policy changes don't require resyncing every project. Same enforcement in Claude Code, Codex, claude -p headless, and CI.

Audits: the feedback loop

Hooks and the skill enforce policy in the moment. Audits answer the harder question: is the policy still being enforced six weeks from now?

Ten scheduled audits run on cron against the registered project list. Some hit weekly, some daily, some monthly. Different checks, but each one writes a dated markdown report back into audits/ so I can trend over time.

cross-project-process-audit is the headline one. Runs every Monday morning and produces a snapshot of which projects are wired correctly, which have stale hooks, which carry recent --no-verify bypasses, which CLAUDE.md files have grown past the 5 KB soft target. parent-hook-drift runs Sunday night, SHA256-comparing the canonical hook implementations against the deployed copies; deltas surface as drift signatures I can pull upstream or push downstream. bypass-tripwire is silent on clean days and noisy when someone used --no-verify or shipped a direct push. test-health actually runs each project's fast suite. dep-vulnerabilities and dep-currency cover security and lockfile age. gotchas-rollup is the Rule-of-Three engine: it aggregates project-local gotchas.md entries and promotes anything that has hit three independent projects into the parent rules.

The audits aren't only observability. They're the reason the system stays disciplined. Every Monday I get a report telling me, in plain text, where my engineering process has drifted from its own claims. Then I fix it. The week after, the audit confirms the fix or surfaces the next drift. The loop is the thing.

What I'd tell anyone trying to build this

Four things I didn't appreciate when I started.

Process work is leverage, not overhead. Every hour spent making the system catch its own drift bought back ten over the following month. The pre-push guard alone has caught at least three force-push-to-main attempts that would have cost half a day each.

The Rule of Three matters more than any individual rule. I deleted several "obviously good" parent rules in the first month because the second project disagreed about the right shape. Three witnesses isn't bureaucracy — it's the cheapest defence against locking in the wrong abstraction.

Receipts beat eloquence. I trust the diff and the exit code. I no longer trust a trailing summary that says "I've fixed the bug and all tests pass." Half the work of building Trellis was unlearning the instinct to read summaries and believe them.

Open-source it early. Putting Trellis in public forced me to write engineering-process.md, the narrative manual that explains why each rule exists. Future-me reading it six months from now is the most important audience the repo has.

Try it

MIT-licensed and forkable: github.com/Zireael26/trellis

AGENT_SETUP.md walks any Claude Code or Codex session through customising it for your machine in about ten minutes. AGENT_ONBOARD_PROJECT.md is what you paste inside the bootstrapped repo to onboard each project after that. The audit fleet ships pre-registered as scheduled-task prompts; wire them into whatever scheduler you already use.

If you fork it, tell me what you changed. The Rule of Three needs three independent witnesses, and right now I am only one of them.