# AI Agent Codebase Constitution (v1)

Derived from first principles. Paste into `.specify/memory/constitution.md` via
`/speckit.constitution`. Preserve the exact phrasing of each principle; do not
paraphrase into generalities.

Source: https://daita.io/blog/spec_kit_constitution_first_principles

## Principles

### 1. Evals before features

No prompt, tool, or agent flow ships without an eval that would have failed
before the change and passes after it. "Looks better in one manual test" is
not an eval. Every change carries its regression case into the eval set.

### 2. Context is a budget

Tokens in context are a scarce, billed resource. Every prompt, system
message, tool schema, and memory injection is justified against its cost.
Unused context is deleted, not kept "just in case". Cache-friendly ordering
(stable prefix, volatile suffix) is a correctness property, not an
optimization.

### 3. Tools are typed contracts

Every tool has a JSON schema, a documented side-effect profile, and a
deterministic error taxonomy. Never let the model call a tool whose failure
modes are not enumerated. Tool descriptions are prompts, treat them with
the same care.

### 4. Deterministic where possible, probabilistic where necessary

Parsing, formatting, retrieval, and routing should be deterministic code.
Only use the model for the irreducibly probabilistic step (generation,
classification without ground truth, judgment). A regex is cheaper, faster,
and more reliable than a prompt for structured input.

### 5. Observability is per-turn, not per-session

Every model call emits a structured event: prompt hash, tool calls, tokens
in/out, cost, latency, eval id if applicable. High-cardinality fields
(tenant, user, feature flag, model version) are required. No unstructured
log lines in new code. Replay must be possible from logs alone.

### 6. Separate agent authority from side-effect blast radius

An agent's tool permissions scale with its eval coverage, not with what
the demo requires. Destructive tools (delete, send, pay, deploy) require
either a human confirmation step or a signed pre-approval token. No agent
gets write access to a system it cannot be tested against.

### 7. Optimize for deletion, not extension

Prompts, tools, and chains must be small enough that one engineer can
delete and rewrite them in a day. Reject framework layers that hide the
prompt from the developer. Inline the prompt; extract only when two
concrete consumers demand the same wording.

### 8. Recovery over prevention

Every agent action is revertible, or it is gated by a human. Feature flags
gate rollouts of new prompts and models. Model versions are pinned; silent
upstream model updates are not acceptable. A bad prompt must be rolled back
in under five minutes without a code deploy.

### 9. Attention is finite (for the agent too)

Every additional instruction, few-shot example, or tool description
competes for the agent's attention. If a principle does not change at
least one observed behavior in evals, delete it. Prompt rot is a real
failure mode; run eval drift checks on every prompt change.

### 10. Commands are discoverable; local dev matches CI

Every repeatable action (run eval set, replay trace, regenerate fixtures,
lint prompts, deploy agent) is a single named command runnable locally with
the same arguments CI uses. If a new contributor cannot run the full eval
suite against a fixture in 30 seconds, the interface is broken.