The Problem
Scrum, Kanban, XP, Shape Up: all designed for teams where coding was the slowest thing in the room. AI broke that assumption in 2023. The bottleneck moved. The process did not.
Faros AI · 10,000+ developers · 1,255 teams · 2025
Most engineer-hours went to writing code. Spec took a meeting. Review took a morning. The constraint was typing speed and domain knowledge, so methodologies optimized for build throughput: sprints, velocity, story points.
AI accelerated the build. 98% more PRs landed in the queue. Human review capacity didn’t double. It can’t. So the pipeline jammed at verification, and review became rubber-stamping.
Faster coding relocated pressure to the two activities that still require a human brain: specification (what should the system do?) and verification (did it do it correctly?). Teams produce nearly twice the code. Reviewers still have the same 8-hour day. Something has to give, and what gives is rigor.
The CNC Analogy
This shift is not new. Manufacturing went through it in the 1970s. Before CNC machines, a machinist’s value was in their hands: steady cuts, feel for the metal, years of muscle memory. CNC automated the cutting. It didn’t replace machinists. It changed what “machinist” meant.
The CNC shops that bolted machines onto manual workflows kept machinists hand-programming G-code from memory instead of from CAD drawings. No standardized toolpaths. No updated inspection protocols. The machines cut faster. They produced more out-of-spec parts faster. Scrap rates went up. QC couldn’t keep pace with 3× the output volume. The bottleneck moved from the spindle to the inspection bench, and nobody moved with it.
The same thing is happening now. AI produces 98% more PRs. Review time increases 91%. Delivery speed: unchanged. The bottleneck moved from coding to verification, and the process did not move with it. Datum moves the process.
| CAD design — what to make |
| G-code programming — how to make it |
| Machine setup — load, calibrate |
| Machine runs autonomously |
| Operator monitors for deviation |
| QC inspection against tolerances |
| Scrap / rework if out of spec |
| Brief + tech spec — what to build |
| Standards doc + prompt — how the AI builds it |
| Session setup — load standards + spec |
| AI generates code |
| Builder self-verifies against criteria |
| Review against spec — output matches contract? |
| Spec-break rule — stop, re-spec, resume |
Loads programs, runs parts, checks output against prints.
Writes programs for standard parts. Selects tooling and fixtures.
Designs complex toolpaths. Selects cutting strategies for new geometries.
Governs quality. Trains operators. Owns the standards library.
Builds from specs with AI. Self-verifies. Flags gaps instead of guessing.
Drafts specs for standard stories. Learns from the gap between draft and revision.
Owns specs end-to-end. Leads spec sessions. Reviews juniors’ code.
Multi-lens review. Standards contributions. Mentors builders. Multiplies the team.
The Axioms
These are the non-negotiable principles the methodology is built on. A team that accepts the axioms can adapt the practices. A team that rejects the axioms should use a different methodology.
The Economics
Datum front-loads effort into specification because defects caught upstream cost orders of magnitude less than defects caught downstream. This is not opinion. It is the most replicated finding in software engineering economics.
Boehm’s Defect Cost Curve
Barry Boehm’s research,[1] validated by IBM Systems Sciences Institute[2] and NASA JPL[3] across decades, quantifies the cost escalation. A missing transaction wrapper costs nothing to add to a spec. It costs 15 minutes during coding, 2–4 hours during testing, and days-to-weeks plus thousands of dollars in production.
Monday spec sessions. The cheapest place to find a defect. A 30-minute conversation prevents a 30-hour production incident.
Tuesday–Wednesday build. The defect is in the code now. Finding it requires reading, testing, reverting. Still cheap relative to production.
Thursday hardening. The defect interacts with other systems. Finding it requires integration tests, environment debugging, cross-team coordination.
Incident response, data repair, customer communication, regulatory reporting, reputation damage. The defect now has a dollar cost attached to it.
The 40/60 Reality
Where delivery effort actually goes, and why AI only accelerates part of it.
When Specification Was Skipped
These are not hypotheticals. These are the dollar costs of missing specs.
$93.7M budgeted. $1.7B actual. An 18× cost overrun. Multiple contractors built to different assumptions. The defects were not in the code. They were in the missing contracts between systems.
Shared API contracts, agreed data schemas, and integration test criteria defined before any contractor wrote a line of code. Estimated spec cost: $2M. Estimated savings: $1.6B.
$440M lost in 45 minutes. Old code was reactivated because the deployment specification did not account for it. The code worked exactly as written. The specification of what should be deployed did not exist.
A deployment checklist with explicit state: which services are active, which are deprecated, what happens on rollback. The spec-break rule would have triggered before the first trade executed.
The AI Amplification Effect
Without specification discipline, AI makes the economics worse, not better.
AI-generated code that matches a spec is verifiable. The reviewer checks behavior against written criteria. AI-generated code without a spec requires full white-box review, which does not scale. The 98% increase in PRs flows into a review pipeline that is 91% slower, producing no delivery speed gain and introducing quality risk.
Capers Jones’s research across thousands of projects:[4] poor requirements are the single largest source of project failure, responsible for approximately 40% of all defects. AI does not fix requirements. It amplifies them, for better or worse.
The Capacity Trap
A 2026 joint study by Accenture and the Wharton School[5] analyzed task-level data across 18 industries covering more than 120 million workers. For a modeled $60 billion company, they estimated $6 billion in potential annual revenue growth from agentic AI at full maturity, alongside $1.7 billion in annual productivity gains.
The catch: roughly two-thirds of those productivity gains materialized as direct cost savings, but the remaining third appeared as cost avoidance: capacity freed for different, higher-value work. Without intentional redeployment, that freed capacity does not become growth. It evaporates.
“Productivity becomes growth only through redeployment,” the report warns. “Unless leaders deliberately redeploy that capacity toward higher-value work, productivity gains stall at efficiency and fail to translate into growth.”
This is the 40/60 reality at enterprise scale. AI accelerates the 40% (code, content, analysis). The 60% (specification, verification, architecture, governance) does not compress. Teams that bank the freed hours as headcount reduction instead of reinvesting them into the 60% get the worst of both outcomes: more AI output, less human oversight, faster accumulation of defects that cost 60–100× to fix in production.
The Monday investment in specification is not overhead. It is the cheapest defect prevention the industry has ever measured. Every $1 spent on specification saves $10–$100 in failure costs. No AI model, no testing framework, no deployment pipeline achieves that ROI. And as the Accenture/Wharton data shows, the freed capacity from AI acceleration is only valuable if it flows back into the work AI cannot do: specifying, verifying, and governing.
When to Use It
Before investing in the mechanics, check the fit. Datum is designed for a specific context. If your team matches these four signals, the methodology will produce results within one cycle. If it doesn’t match, save yourself the friction.
Context Variants
The full model assumes a 7-person pod with weekly cadence. These three contexts require adaptation, but the core holds.
How it works
Before each AI-assisted build, write a lightweight brief: problem statement, acceptance criteria, contracts touched, boundary constraints. Self-review against the brief before merging. Load the standards document into every AI session.
What compresses
- Spec sessions → written documents
- Peer review → self-review with stricter criteria
- Defensive checklist scored on every merge
How it works
Problem statement, hypothesis, experiment design, success/failure criteria, and a time-box. “We will know whether approach X is viable by measuring Y within Z hours” is a spec. “Just start coding and see what happens” is vibe coding.
What compresses
- Brief → hypothesis brief with time-box
- Spec is shorter and more uncertain
- But it exists. That’s the difference.
How it works
Symptoms, root cause hypothesis, fix scope, blast radius, rollback plan. The fix is verified against the brief. Full specification and hardening happen in the next regular cycle. Firefighting without specification is how incidents recur.
What compresses
- Weekly cadence → 5–10 minute incident brief
- Full hardening deferred to next cycle
- Post-incident: spec the fix properly
Genuine Poor Fit
Three conditions where Datum is the wrong choice. Not “hard to adopt” — genuinely wrong.
The team believes coding is the bottleneck, or that specification is overhead rather than design work. The practices will be treated as bureaucracy and abandoned within weeks.
Ask: “Would you spend 30 minutes specifying a story to save 4 hours of rework?” If the answer is no, fix the belief first.
The customer cannot participate in specification. Very rare, and usually a relationship problem. But if it cannot be resolved, the upstream specification chain is broken.
Can someone write acceptance criteria that the customer would agree with? If yes, that person is the proxy PO. If no, the chain is broken.
The methodology is designed for the agentic context. Without AI leverage, the bottleneck assumptions do not hold and traditional methodologies are better calibrated.
Is AI generating >20% of your code? If not, the 2.5× multiplier and the review-volume economics don’t apply. Use Shape Up or Kanban instead.
What Datum Is Not
For the skeptic in the room.
How It Works
Datum runs on a weekly cadence. The cadence is a rhythm, not a commitment. Stories carry over when they must. The carryover is explicit and communicated.
| Day | Primary Activity | Deliverable |
|---|---|---|
| Monday | Alignment + collaborative spec sessions | Every story has a brief, tech spec, and estimate before anyone builds |
| Tue–Wed | AI-assisted build + continuous review | Code merged only after verification against spec |
| Thursday | Integration verification + hardening | Everything passes the defensive checklist at level 2+ |
| Friday | Demo + retro + next week prep | Stakeholders have seen the work; next Monday's briefs are ready |
No story enters the build queue without a brief (business intent, acceptance criteria) and a technical spec (contracts, state machines, boundary schemas). This is not a best practice. It is the load-bearing rule. Removing it collapses the methodology back to vibe coding.
Spec sessions are collaborative design conversations — 30 to 60 minutes per story — involving everyone who will build, verify, and accept the work. The conversation is the design work. The document is the record. Specs are not written by one person and handed to another.
Builders may have at most 2 open PRs awaiting review at any time. AI-assisted builders produce code faster than humans can review it. Without a WIP limit, “continuous review” becomes “batched review on Thursday”: Scrum with extra steps.
If a builder discovers 3 or more spec gaps in a single story, building stops. The builder, reviewer, and spec author hold a 30-minute re-spec session before more code is generated. Building from a broken spec compounds waste. The re-spec cost is always lower than the rework cost.
Estimates are produced during spec sessions, not in separate planning ceremonies. Four dimensions drive the estimate:
Every AI session loads the standards document as context. This solves the “fresh-context problem”: without shared context, each AI session invents its own patterns, producing inconsistency across the codebase. The standards document is the primary control mechanism for AI output quality — more effective than prompt tuning or post-hoc review alone.
Delivery Styles
The previous chapter describes the weekly cadence — Monday spec, Tuesday build, Thursday harden, Friday retro. That is one delivery style. Datum’s core principles (spec before build, review against spec, standards as AI governance) are delivery-agnostic. They work in at least three distinct rhythms, each suited to different team shapes and business contexts.
The rhythm
Weekly cadence with named days: Monday alignment + spec sessions, Tuesday–Wednesday build, Thursday hardening, Friday demo + retro. Stories are batched into a weekly scope. Carryover is explicit.
Best for
- Product teams with stakeholders who expect weekly visibility
- Teams new to Datum — the structure makes habits visible
- Regulated environments that need ceremony documentation
Feels like
Shape Up meets Scrum. Fixed time, variable scope. The spec session replaces sprint planning. The retro replaces the retrospective. There is no separate standup — Monday alignment covers the week.
The rhythm
No fixed cadence. Specs are written continuously by the EM/PO and architects and placed in a “Ready” column. Engineers pull the next spec when they finish their current work. WIP limit of 1 per engineer. Review is continuous; every PR is reviewed before the next spec is pulled.
The board
| Backlog | Spec Ready | Building | In Review | Done |
|---|---|---|---|---|
| IdeaIdea | PAY-41USR-18INV-09 | PAY-40 | USR-17 | PAY-39 |
Engineers never enter the Backlog column. They pick from “Spec Ready”: specs with a brief, tech spec, and acceptance criteria. The board makes WIP visible. If “In Review” is full, review before pulling new work.
Best for
- Mature teams that have internalized spec discipline
- Support/ops teams with unpredictable inflow
- Teams where the PO writes specs ahead of the build capacity
Feels like
Classic Kanban with one critical addition: nothing enters “Spec Ready” without a complete brief + tech spec. The board is the backlog, the spec, and the status tracker. No ceremonies except a weekly retro (30 min) to review the findings log and tune the process.
The rhythm
Each story is specced, built, reviewed, and shipped as a single atomic flow. No batching. No waiting for Thursday to harden. The defensive checklist is a CI gate, not a human ceremony. Each spec covers one concern: one endpoint, one state machine, one contract change. The 2.5× multiplier is baked into the estimate, not into a separate hardening day.
Best for
- High-trust teams with strong CI/CD pipelines
- Infrastructure and platform teams
- Teams where the Lead Architect is also building
Feels like
Trunk-based development with spec gates. Every merge is a mini-release. The spec is a PR description that follows a template, not a separate document. Review is synchronous; the reviewer is tagged at PR time, not at end-of-day. Friday retro still happens, but it’s the only ceremony.
The Team
Datum is designed for a 7-person pod. Five personas, each with a distinct function. The composition is deliberate: enough senior capacity to write specs and review, enough junior capacity to leverage AI for high-volume building, and a single point of architectural accountability.
- Writes briefs with testable acceptance criteria
- Owns stakeholder alignment
- Monday spec sessions: brings business intent
- Friday demos: presents to stakeholders
- Owns the standards document
- Writes ADRs
- Multi-lens review on complex changes
- Scores defensive checklist weekly
- Brooks’s “single architect”
- Translates briefs into technical specs
- Defines contracts and state machines
- Builds cross-cutting concerns
- Unblocks when others hit ambiguity
- Co-writes specs with architects
- Reviews junior output against the spec
- AI-assisted building on complex stories
- Mentors juniors toward spec-writing
- Builds from specs with AI assistance
- Self-verifies against acceptance criteria
- Flags spec gaps — never guesses
- More code volume, under tighter governance
RACI Matrix
Responsibility assignment across pod roles for every major activity.
| Activity | EM/PO | Lead Arch | Arch | Senior | Junior |
|---|---|---|---|---|---|
| Write brief | R/A | C | — | C | I |
| Write technical spec | C | C | R/A | R | C |
| Lead spec session (high-risk) | R | A | R | C | C |
| Lead spec session (standard) | C | I | C | R/A | C |
| Write ADR | I | A | R | C | — |
| Estimate story | C | C | R | R/A | C |
| Activity | EM/PO | Lead Arch | Arch | Senior | Junior |
|---|---|---|---|---|---|
| AI-assisted build from spec | — | — | R | R | R |
| Self-verify vs acceptance criteria | — | — | R | R | R/A |
| Flag spec gaps during build | — | I | I | I | R |
| Trigger spec-break (3+ gaps) | — | I | C | A | R |
| Rapid re-spec session | — | I | R/A | R | C |
| Activity | EM/PO | Lead Arch | Arch | Senior | Junior |
|---|---|---|---|---|---|
| Review junior PRs against spec | — | — | — | R/A | — |
| Review complex PRs | — | I | R/A | — | — |
| Multi-lens review | — | R/A | C | — | — |
| Spot-check vs brief | R/A | — | — | — | — |
| Peer review (junior→junior) | — | — | — | C | R |
| Activity | EM/PO | Lead Arch | Arch | Senior | Junior |
|---|---|---|---|---|---|
| Own standards document | — | R/A | C | C | I |
| Update standards (new patterns) | — | A | R | C | — |
| Score defensive checklist | — | R/A | R | — | — |
| Maintain prompt library | C | C | C | R | R |
| Run Friday retro | C | A | C | R | R |
| Agent governance | — | R/A | C | C | I |
| Activity | EM/PO | Lead Arch | Arch | Senior | Junior |
|---|---|---|---|---|---|
| On-call (primary) | — | — | R | R | — |
| On-call (shadow) | — | — | — | — | R |
| On-call (escalation) | — | R/A | — | — | — |
| Incident communication | R/A | C | — | — | — |
| Post-mortem facilitation | I | R/A | C | C | I |
| Thursday integration verify | — | R | R/A | R | — |
| Thursday hardening | — | — | — | R/A | R |
| Activity | EM/PO | Lead Arch | Arch | Senior | Junior |
|---|---|---|---|---|---|
| Assess junior gate criteria | — | A | — | R | — |
| Mentor junior (Builder stage) | — | — | — | R/A | — |
| Write promotion case | — | C | — | R | — |
| Diagnose stalled engineer | — | A | C | R | — |
| 7-Person Role | Team of 3 | Team of 2 |
|---|---|---|
| EM/PO | Person A (part-time) | Shared |
| Lead Architect | Person A | Engineer A |
| Architect | Person B | Engineer A |
| Senior Engineer | Person B | Engineer A |
| Junior Engineer | Person C | Engineer B |
All RACI assignments for collapsed roles merge onto the absorbing person. Where one person holds both R and A, compensate with stricter acceptance criteria and written self-review.
Team Sizing
The 7-person pod is the full model. Not every team starts there. The axioms hold at any size. The practices compress.
The people
Engineer A (more senior): Lead Arch + Spec Author + Reviewer. Writes briefs and tech specs. Reviews all of B’s output against the spec. Owns the standards document.
Engineer B (more junior or equal): Builder + Co-Specifier. Builds from specs with AI. Self-verifies before submitting. Drafts specs as they grow.
What compresses
- Spec sessions: 15–30 min conversation, not a room
- WIP limit drops to 1 PR (one reviewer)
- Standards doc: single page, not comprehensive
- Thursday hardening: 2-hour block, not full day
- PO function: shared between both
The people
Person A: PO/EM or spec-heavy senior. Writes briefs, leads spec sessions, reviews complex work.
Person B: Senior engineer. Writes tech specs, builds, reviews C’s output.
Person C: Engineer (builder). Builds from specs, self-verifies, flags gaps.
The key constraint
Someone must own brief quality. If no one does, the team drifts back to vibe coding. Person A can be a part-time PO who also codes, or a full-time engineer who owns specification.
The people
Person A: PO/EM or Lead Architect. Briefs, standards, architectural decisions.
Person B: Senior. Tech specs, builds, reviews C and D’s output.
Person C–D: Engineers. Build from specs, peer-review each other, flag gaps.
Person E (if 5): Junior. Builds under supervision.
What compresses
- No separate Architect role. Lead Arch/Senior absorbs it
- Review concentrated on one person — mitigate with stricter WIP limits
- At 5 people, one hire from the full 7-person pod
Growing Engineers
Datum requires a structured growth path because the methodology's leverage depends on juniors who can eventually write specs, not just build from them. The risk without an explicit path: permanent junior executors who produce high-volume, low-accountability output indefinitely.
The four stages are performance-gated, not time-gated. Transitions happen when the engineer demonstrates the gate criteria, not when a calendar threshold passes.
- Spec: 10% · Build: 80%
- Verify: 20% · Govern: 0%
- Spec: 40% · Build: 40%
- Verify: 25% · Govern: 5%
- Spec: 40% · Build: 20%
- Verify: 30% · Govern: 10%
- Spec: 30% · Build: 5%
- Verify: 35% · Govern: 30%
Skills Are the Currency, Not Titles
A 2026 Accenture/Wharton study developed the WAsX (Wharton–Accenture Skills Index) to measure how skills translate into economic value in an AI-enabled economy. The finding: as AI automates routine cognitive work, the market increasingly rewards judgment, coordination, and domain-specific execution. Exactly what Datum’s growth path develops.
- Code generation from requirements
- Data formatting and transformation
- Template-based documentation
- Standard test case creation
These are Builder-stage skills. They are necessary but no longer scarce. AI performs them at scale, and the market prices them accordingly.
- Specification: translating ambiguity into contracts
- Verification: evaluating output against intent
- Architecture: making trade-offs with incomplete information
- Governance: maintaining system coherence at scale
These are Specifier-and-above skills. The WAsX data shows the market assigns increasing monetary value to capabilities that complement AI rather than compete with it.
The study also found a persistent signaling gap: workers overwhelmingly signal broad, generalist traits, while employers pay for specialized, execution-oriented capabilities. In Datum terms: calling yourself a “senior engineer” signals nothing. Demonstrating that your specs produce low-rework implementations and your reviews catch drift. That is what the market pays for.
This is why the growth path is performance-gated, not time-gated. Gate criteria like “senior revisions are cosmetic, not structural” (Co-Specifiers) and “low rework rate on own specs” (Specifiers) measure skills that carry economic value, not years spent.
The Spec Revision Is the Coaching Session
Accenture/Wharton studied the organizations they call “Talent Reinventors.” Leaders were 1.3× more likely to delegate and coach, even when it slowed execution. These organizations grew revenue 1.8 percentage points faster, strengthened culture (7× more likely), and increased adaptability (4× more likely).
In Datum, this coaching is the spec revision loop.
A Co-Specifier drafts a spec. The Lead Architect revises it. That revision is not rework. It is the most direct way to improve the junior’s judgment.
The delta between draft and final spec shows exactly where the junior’s judgment fell short. Not abstract feedback. Not “think more carefully.” Just: you wrote X, the spec needs Y, here’s why.
The architect spends time revising a spec the junior wrote poorly instead of writing it correctly themselves in half the time. This feels like a productivity loss. It is a resilience investment.
A pod that never promotes Builders to Co-Specifiers has a single point of failure at the architect. A pod that coaches has a pipeline. When the architect is overloaded or gone, a Co-Specifier steps in at reduced quality rather than no quality.
This is the structural fix for the Review Death Spiral (Ch. 14). One of its root causes is an architect with no bench depth. Coaching builds that bench.
The Lead Architect has two jobs. Governance: setting the standards document, owning consistency, reviewing against specs. Coaching: letting juniors attempt work above their level, then using the gap as feedback. Organizations that invest in both build stronger benches, lower failure risk, and grow engineer judgment faster because the feedback loop is embedded in real work.
Talent Reinventors data: Accenture & Wharton School, The Age of Co-Intelligence, March 2026.
The Artifacts
Documentation is infrastructure. Not a nice-to-have, not a chore for after the sprint. Infrastructure, the same way a CI pipeline is infrastructure. Every AI session loads these documents. Every review verifies against them. Every retro improves them. When the documentation is wrong, every AI session produces wrong output at scale. When it is precise, every session inherits the team’s accumulated decisions.
The methodology produces seven named artifacts, each with a clear owner and update cadence. Each artifact is a control surface: one person’s decisions become another’s constraints. Click any artifact to see its structure, an example, and the failure mode it prevents.
Single source of architectural truth. Loaded into every AI session.
Purpose
The standards document is the primary governance mechanism for AI output quality. Every AI coding session loads it as context, solving the “fresh-context problem”: without shared context, each session invents its own patterns. The standards document makes architectural decisions portable and enforceable without the Lead Architect being present.
Structure
# Standards Document — [Project Name]
## Architecture
- Three-tier: presentation → logic → data
- Logic tier has zero I/O knowledge
- All external calls go through client wrappers
## Naming
- Services: PascalCase (UserService)
- Endpoints: kebab-case (/user-profile)
- Database tables: snake_case (user_profile)
## Error Contract
All errors return: { code, message, correlationId }
## Non-Negotiable Rules
- No raw SQL — use query builder
- No silent catch — log and re-raise
- No business logic in controllers
Update cadence
Updated by the Lead Architect when Friday retros surface gaps, or when new patterns emerge during Thursday hardening. Reviewed at Monday alignment.
What breaks without it
Without a standards document, a 5-person pod using AI generates code in 5 different styles. Error handling is inconsistent. Naming conventions drift. Each PR review becomes a style debate. The Lead Architect becomes a bottleneck because their knowledge is in their head, not in a document the AI can read.
Context, decision, and consequences for every architectural choice.
Purpose
ADRs capture the why: the context at the time, the options considered, and the trade-offs accepted. Six months later, when someone asks “why did we use message queues instead of direct API calls?”, the ADR answers without needing the original architect in the room.
Structure
# ADR-007: Use event-driven architecture for billing
## Status: Accepted (2026-03-15)
## Context
Billing calculations depend on data from 3 services.
Synchronous calls create a cascade failure risk:
if Inventory is down, Billing cannot process orders.
## Decision
We will use an event bus (RabbitMQ) for inter-service
billing communication.
## Consequences
+ Services are decoupled — Billing processes events
when Inventory recovers
+ Easier to add new billing triggers
- Eventual consistency — billing may lag by seconds
- Need dead-letter queue for failed events
Update cadence
Created during spec sessions when architectural decisions are made. Drafted by the Architect, reviewed by the Lead Architect. Updated when decisions are revisited or superseded.
What breaks without it
Without ADRs, teams relitigate the same decisions every quarter. New team members reverse architectural choices because they don’t know why they were made. The standards document says what to do. ADRs provide the why, preventing refactors from undoing deliberate trade-offs.
Business intent, scope, constraints, and testable acceptance criteria per story.
Purpose
The brief is the upstream input that determines everything downstream. It translates stakeholder needs into a form engineers can spec against. A brief that says “improve the checkout flow” produces vague specs and vague AI output. A brief that says “reduce checkout abandonment at the payment step by adding Apple Pay, constrained to the existing Stripe integration” produces a spec that an AI can build from.
Structure
## Brief: Add Apple Pay to Checkout
**Business intent**: 23% of mobile users abandon at
payment entry. Apple Pay eliminates manual card input.
**Scope**: Payment step only. No changes to cart,
shipping, or order confirmation.
**Constraints**:
- Must use existing Stripe integration
- iOS Safari and Chrome on iOS only
- Fallback to card entry if Apple Pay unavailable
**Acceptance criteria**:
- [ ] Apple Pay button appears on iOS Safari
- [ ] Successful payment creates order in Stripe
- [ ] Non-iOS browsers see no change
- [ ] Failed Apple Pay falls back to card form
Precision upgrade: EARS notation
For teams that want machine-parsable requirements, use EARS (Easy Approach to Requirements Syntax). Each requirement follows a pattern that an AI can extract, test against, and verify automatically:
WHEN a user submits a payment with Apple Pay
THE SYSTEM SHALL create a Stripe payment intent
with idempotency key derived from order ID
SO THAT duplicate submissions do not produce
double charges
WHILE the payment intent status is "processing"
THE SYSTEM SHALL return the existing payment ID
on subsequent requests for the same order
SO THAT concurrent requests are handled safely
IF the Stripe API does not respond within 30 seconds
THEN THE SYSTEM SHALL return STRIPE_UNAVAILABLE
and log the timeout with correlation ID
SO THAT the failure is visible and the user can retry
EARS is a precision upgrade, not a replacement for the standard brief format. Use it when requirements must be unambiguous enough for an AI to generate test cases directly from the spec.
Update cadence
Written by EM/PO before Monday spec sessions. Pre-drafted on Fridays using stakeholder feedback from demos. Refined during the spec session based on technical constraints surfaced by the architect.
What breaks without it
Without briefs, engineers spec against their assumptions about what the business wants. The AI builds what the spec says, which is what the engineer assumed, which may not be what the customer needs. The gap is invisible until demo day. Briefs force the business intent to be explicit before any code is generated.
Contracts, state machines, boundary schemas, test expectations per story.
Purpose
The technical spec is the datum — the fixed reference point that the AI builds against and the reviewer verifies against. It defines the contracts (what goes in, what comes out), the state transitions (what’s legal, what’s not), and the boundary schemas (what external data looks like). The spec is what makes AI output verifiable instead of vibes-based.
Structure
## Spec: Apple Pay Payment Handler
**Contracts**:
POST /api/payments/apple-pay
Request: { token: string, orderId: string }
Response: { paymentId, status, receiptUrl }
Errors: INVALID_TOKEN, ORDER_NOT_FOUND,
PAYMENT_DECLINED, STRIPE_UNAVAILABLE
**State machine**:
idle → validating → processing → completed
idle → validating → failed
processing → failed (timeout after 30s)
**Boundary schema** (Stripe webhook):
{ type: "payment_intent.succeeded",
data: { object: { id, amount, metadata } } }
**Test expectations**:
- Expired token → INVALID_TOKEN, no Stripe call
- Stripe timeout → STRIPE_UNAVAILABLE after 30s
- Duplicate submission → idempotent (same paymentId)
Update cadence
Created during Monday spec sessions. Updated if builders discover spec gaps during Tuesday/Wednesday build (via the spec-break rule: 3+ gaps = stop and re-spec). Finalized during Thursday hardening.
What breaks without it
Without tech specs, reviews become subjective. The reviewer checks whether the code “looks right” rather than whether it matches a defined contract. Edge cases are discovered in production, not in spec sessions. The AI generates plausible code with no verifiable reference point.
Weekly quality scorecard across 8 categories. Tracks hardening over time.
Purpose
The defensive checklist converts “code quality” from an opinion into a measurable score. Each category (input validation, error handling, idempotency, timeouts, logging, state management, security, testing) is scored 0–3. The scores trend over time, making hardening progress visible to the team and to stakeholders.
Structure
## Defensive Checklist — Week of 2026-03-24
| Category | Score | Notes |
|------------------|-------|--------------------------|
| Input validation | 3 | All boundaries covered |
| Error handling | 2 | 2 bare catches remaining |
| Idempotency | 2 | Payment endpoint done |
| Timeouts | 1 | 4 calls missing timeout |
| Logging | 3 | Structured, correlation |
| State mgmt | 2 | Order FSM complete |
| Security | 2 | Rate limiting pending |
| Testing | 2 | 78% logic coverage |
**Overall: 17/24 (Level 2)**
Target: Level 2+ (16/24) by Thursday ✓
Update cadence
Scored by the Lead Architect every Thursday during hardening. Reviewed at Friday retro. The trend line (not the absolute score) is the metric that matters.
What breaks without it
Without the checklist, “hardening” is undefined. The team ships when they feel done, not when they’ve met a standard. Technical debt accumulates invisibly. Stakeholders cannot assess production readiness because there is no shared definition of what “ready” means.
Effective AI agent prompts. The pod’s institutional memory for agentic work.
Purpose
The prompt library captures what works. When a senior discovers that a specific prompt structure produces better test coverage, or that loading the standards document in a particular order reduces hallucination, that knowledge belongs to the team — not to one person’s clipboard. The library is the pod’s institutional memory for agentic work.
Structure
## Prompt: Build from Spec (Standard)
**When to use**: Building any story from a tech spec
**Load order**:
1. Standards document (full)
2. Technical spec for this story
3. Relevant existing code (interfaces only)
**Prompt template**:
"Implement [story] per the attached spec.
Follow the standards document for all patterns.
Write tests before implementation.
Flag any spec gaps — do not guess."
**What it prevents**: AI inventing patterns not
in the standards. Building without tests. Silently
filling spec gaps with assumptions.
**Contributed by**: Senior A (2026-03-10)
**Validated by**: 12 stories, 0 spec-gap misses
Update cadence
Contributed by anyone in the pod when they find an effective prompt pattern. Reviewed at Friday retro. Pruned quarterly — prompts that haven’t been used in 6 weeks are archived, not deleted.
What breaks without it
Without a prompt library, each team member discovers effective prompts independently. The senior who leaves takes their prompt knowledge with them. Juniors struggle with AI tools because nobody shared what works. The team never compounds its agentic skills.
All review findings, spec gaps, and spec-break triggers. Source of retro data.
Purpose
The review findings log is the feedback loop that improves specification quality over time. Every review finding, spec gap, and spec-break trigger is recorded with enough context to spot patterns. If the same kind of gap appears three weeks in a row, the spec process has a hole — and the log makes that visible before it becomes a production incident.
Structure
## Review Findings — Week of 2026-03-24
| Story | Finding | Category | Root Cause |
|---------|----------------------|------------|---------------|
| PAY-41 | Missing timeout on | Spec gap | Spec didn't |
| | Stripe call | | cover timeouts|
| PAY-42 | Error swallowed in | Code issue | Standards doc |
| | webhook handler | | was not loaded|
| USR-18 | Spec-break triggered | Spec gap | Auth edge |
| | (4 gaps found) | | cases missing |
## Patterns This Week
- 2 of 3 gaps were timeout-related → add timeout
section to spec template
- 1 spec-break from auth complexity → flag auth
stories for Lead Arch spec session
Update cadence
Updated continuously during Tuesday/Wednesday reviews. Patterns section written Thursday. Discussed at Friday retro to drive spec template improvements for next week.
What breaks without it
Without the findings log, retros are opinion-based: “I feel like our specs are getting better.” With it, retros are data-driven: “Timeout-related spec gaps dropped from 4/week to 0/week after we added the timeout section to the template.” The log turns the methodology into a learning system instead of a static process.
Where Artifacts Live
Artifacts are version-controlled, not scattered across wikis and Slack threads. They live in git, get reviewed like code, and have blame history. A concrete convention:
specs/
PAY-41/
brief.md # PO-owned
spec.md # Architect-owned
tasks.md # Builder-owned
USR-18/
brief.md
spec.md
tasks.mdEach story gets a directory named by its ticket ID. The brief, spec, and task breakdown are co-located. Spec PRs are reviewed and merged before implementation PRs.
docs/
standards.md # Lead Architect
adr/
001-event-bus.md # Architect + Lead
002-auth-pattern.md
checklist/
week-2026-03-24.md
prompts/
build-from-spec.md
review-against-spec.md
findings/
week-2026-03-24.mdGovernance artifacts live in docs/. The standards document is the root. ADRs, checklists, prompts, and findings are subdirectories with their own cadence.
No single artifact is sufficient. The standards document without ADRs loses its rationale. Briefs without tech specs produce vague AI output. Tech specs without review findings never improve. The seven artifacts form a closed loop that lives in git, not in a wiki. Treat these documents the way you treat a CI pipeline: they run on every build, they gate every merge, and when they break, the system breaks.
Metrics
Datum produces specific, measurable signals. Not vanity metrics — diagnostic indicators that reveal whether the methodology is working or decaying. Track these five. Ignore everything else.
- Source: review findings log, measured per story
- Exceeding triggers reallocation, not escalation
- Trend matters more than absolute score
- If consistently high, Monday capacity check is dishonest
- Cross-reference with spec gap rate to distinguish spec vs build problems
What Good Looks Like
Expected metric ranges across the first 4-week adoption cycle.
- Spec gaps: 0.8–1.0
- Review: 6–8 hours
- Checklist: 8–12
- Carryover: 30–50%
- Rework: 60–80%
Everything is high. This is normal. You are calibrating.
- Spec gaps: 0.5–0.7
- Review: 4–6 hours
- Checklist: 12–16
- Carryover: 20–35%
- Rework: 40–60%
Specs are improving. The team finds its rhythm.
- Spec gaps: 0.3–0.5
- Review: 2–4 hours
- Checklist: 14–18
- Carryover: 15–25%
- Rework: 30–45%
Pattern recognition forming. Reviews become routine.
- Spec gaps: < 0.3
- Review: < 4 hours
- Checklist: 16–24
- Carryover: < 20%
- Rework: < 30%
Sustained. The team stops wanting to go back.
Metric Traps
Metrics that look healthy in isolation can mask dysfunction when read together.
Green turnaround + high rework. Reviews are fast because they are shallow. PRs pass review and fail in integration. The turnaround metric is green. The quality is not.
Turnaround under 2h while rework stays above 40%. Findings per review declining while post-merge defects rise.
Low spec gaps + high rework. Builders guess instead of flagging. Spec gap rate is low because gaps go unreported, not because they don’t exist. The spec-break rule never fires.
Spec gaps below 0.2 while rework above 40%. Review findings reference requirements not in the spec. Builders never invoke spec-break.
High scores + production incidents. Checklist scored to pass, not to verify. “We have tests” vs. “our tests cover the failure modes in the spec.”
Scores 18+ while incidents occur in categories scored Level 2+. Post-mortem root causes map to “passing” categories.
Agent Governance
Every AI coding session is a new employee who has never seen your codebase, has no memory of yesterday’s decisions, and will confidently generate plausible-looking code that violates your architecture. The standards document is the onboarding packet. Agent governance is the HR policy.
Adapted from Fuller, “Create an Onboarding Plan for AI Agents,” Harvard Business Review, March 2026.
- The role is narrow by design
- The agent builds
- It does not design, review, or decide
- An employee who redesigns the org chart on day one has exceeded their scope
- So has an AI that generates auth nobody asked for
- Accountability is structural
- The agent has no feelings to hurt
- Verify behavior, not vibes
- Standards updated when patterns emerge
- Prompt library captures what works
- Findings log drives upstream fixes
The Fresh-Context Problem
- Monday’s code:
snake_case - Tuesday’s code:
camelCase - Monday’s errors: raw strings
- Tuesday’s errors: structured objects
The agent is not inconsistent — it never saw Monday. Each session is a different employee with the same title but no shared memory.
- Naming: from the standards document
- Error contracts: from the standards document
- Architecture boundaries: from the tech spec
- Test expectations: from the tech spec
Consistency is a property of the document, not of the agent’s memory. The governance artifacts are the institutional memory the agent lacks.
What Goes Wrong Without It
Without standards loaded, 3+ error handling patterns and 2+ naming conventions emerge within a month. Each reviewer catches different issues — no shared reference, only individual preference. The codebase degrades through a thousand plausible-looking commits.
Grep for error return shapes. String returns + object returns + thrown exceptions in the same service = standards not loaded.
The AI “helpfully” generates auth middleware, logging infrastructure, migrations nobody asked for. Looks good. Passes tests. Introduces unreviewed architectural decisions that compound — each one small enough to approve, collectively large enough to reshape the system.
Compare PR diff against the tech spec. Any file not named in the spec is a scope violation. Track violations per sprint.
The Volume–Risk Spectrum
Not all agent work carries the same risk. A 2026 Accenture/Wharton study argues that governance intensity should scale with where tasks fall on a volume–risk spectrum: high-volume, high-risk domains need more rigorous controls than low-volume, low-risk ones. Uniform governance wastes architect attention on config changes while under-governing payment integrations.
Datum applies this principle at story level. The spec session is where risk is assessed, and the governance response is calibrated accordingly.
- Spec: Brief only — no tech spec required
- Review: Single reviewer, checklist optional
- Builder level: Any stage can own end-to-end
Low-risk stories flow fast. The governance overhead is minimal because the blast radius is small.
- Spec: Full tech spec with acceptance criteria
- Review: Spec-based review, checklist scored
- Builder level: Co-Specifier+ for spec, any for build
The standard Datum flow. Most stories land here. Governance is proportional to the decision surface.
- Spec: Full tech spec + ADR for architectural decisions
- Review: Multi-lens review (Lead Architect + domain expert), checklist mandatory
- Builder level: Specifier+ for spec, architect signs off before build begins
- Additional: Spec-break rule enforced; any ambiguity halts the build
The Accenture/Wharton study found that the function with the largest revenue opportunity (Sales) was also the area with the highest risk-sensitive decisions. In Datum, the same pattern holds: highest-value stories need strongest governance. Architect time spent here, not on config changes, is where governance pays off.
Risk classification happens in the spec session, not after the build. The Lead Architect or spec author tags each story as low, medium, or high risk based on three questions: How many systems does this touch? What happens if it’s wrong? Can it be rolled back? The answers determine which governance track the story follows.
The Accountability Asymmetry
“Intelligence may be scalable, but accountability is not.” That line from the 2026 Accenture/Wharton study of AI agents across 18 industries is the core of why governance is structural.
The study found that 50%+ of U.S. working hours are subject to reshaping by AI agents. By 2027, half of business decisions will be augmented or automated, with 15% fully autonomous by 2028. Customer operations are trending toward 80% autonomous resolution by 2029. Agents are spreading “rapidly across the enterprise value chain, often ahead of formal strategy and governance.”
The report’s sharpest line: “Capability parity does not imply responsibility parity.” AI generates PhD-level reasoning but carries no moral weight, no institutional accountability, no long-term obligation. Those stay human. When intelligence distributes across humans and AI, responsibility does not.
- Code generation: near-infinite
- Analysis and recommendation: near-infinite
- Execution of defined tasks: near-infinite
- Pattern recognition: near-infinite
Everything the agent does well, it does at scale. This is the promise.
- Deciding what to build: human
- Owning the outcome: human
- Setting architectural intent: human
- Accepting the risk: human
Everything the agent cannot do, humans do at the same pace they always have. This is the constraint.
The study’s case modeling revealed a pattern that Datum teams will recognize: the function with the largest revenue opportunity (Sales) was also the area with the highest volume of risk-sensitive decisions. Value and risk scale together. Governance must be designed before agents touch high-value systems. You do not get to learn from failure at this scale.
The Accenture/Wharton report proposes “humans in the lead, not in the loop.” Datum operationalizes the same principle: the Lead Architect does not review every line — but they set the standards document that governs every line. The spec author does not write the code — but they define the contracts the code must satisfy. Human authority is exercised through artifacts, not through direct supervision of every keystroke.
Anything else breaks. When intelligence scales and accountability does not, governance must be structural: documents, checklists, and review gates that run regardless of volume. Any model that depends on a human personally reviewing every output will collapse at the speed AI enables.
Agent governance is not about controlling AI. It is about keeping humans accountable when direct oversight is no longer feasible. The standards document, the tech spec, the review checklist: not bureaucracy. The only way accountability keeps pace with intelligence.
Source: Accenture & Wharton School, The Age of Co-Intelligence, March 2026. Fortune coverage.
When It Breaks
Every methodology has characteristic failure modes. These are the five ways Datum collapses when teams adopt the rituals without the substance. Each violates a specific axiom. Trace the tag to find where it breaks.
Spec sessions happen because the process requires them, but the documents are copy-pasted templates. “It works correctly” is not an acceptance criterion. The AI produces plausible code. Reviews pass because the spec is too vague to fail against.
Spec gap rate stays high but spec-break never fires. Builders guess instead of flagging.
The checklist is scored to pass, not to verify. “We have tests” scores Level 2. “Our tests cover the failure modes in the spec” is the actual standard. When production incidents hit, the relevant category was scored high.
Scores 18+ while incidents occur in categories marked Level 2+. Scores and reality have diverged.
AI-assisted juniors produce PRs faster than seniors can review. WIP limit fires constantly. Seniors rubber-stamp to clear the queue. Turnaround metric is green. Review quality collapses. Defects reach production.
Persistent WIP breaches. Rework rises while turnaround stays green. Reviews ship defects.
The Architect writes specs alone, hands them to builders. Builders treat specs as requirements, not shared design. Questioning the spec feels like insubordination. Spec gaps accumulate silently.
High rework + low spec gaps = builders absorbing ambiguity. Juniors never progress past Builder stage.
Standards written in week 1, never updated. By week 8 the codebase has evolved — new ADRs, new patterns — but the standards document is frozen. New code follows patterns the AI invents. The codebase develops two dialects: “standards-era code” and “post-standards code.”
Checklist consistency scores decline. Review findings reference patterns not in the standards doc. The doc’s last-modified date is more than 2 weeks old.
A Week in Datum
The best way to understand Datum is to watch one story move through it. This is PAY-41: Add Apple Pay to Checkout. Five days, from brief to production.
The EM presents the brief: 23% mobile abandonment at the payment step. Apple Pay eliminates manual card input. Scope: payment step only. Existing Stripe integration, iOS Safari only, fallback to standard card entry.
PAY-41: Add Apple Pay to Checkout
Brief
Mobile abandonment at payment step: 23%
Root cause: manual card input on small screens
Proposed fix: Apple Pay via existing Stripe integration
Scope
- Payment step only (not cart, not confirmation)
- iOS Safari only (non-iOS browsers fall back to card)
- Uses existing Stripe payment intent flow
Acceptance Criteria
[ ] User sees Apple Pay button on iOS Safari
[ ] Tapping Apple Pay completes payment without card input
[ ] Non-iOS browsers show standard card form (no error)
[ ] Duplicate submissions return original result, not double chargeThe Architect drafts the tech spec: a new POST /api/payments/apple-pay endpoint, a state machine (idle → validating → processing → completed | failed), Stripe webhook schema for async confirmation, and a 30-second timeout. A Builder asks about duplicate submissions. An idempotency key goes into the spec before anyone writes code.
Tech Spec — PAY-41
Endpoint
POST /api/payments/apple-pay
Headers: Idempotency-Key (required, UUID)
Body: { cartId, applePayToken }
State Machine
idle → validating → processing → completed
→ failed
External Calls
Stripe PaymentIntent.create — timeout: 30s
Stripe webhook: payment_intent.succeeded | payment_intent.payment_failed
Error Contract
{ code: "PAYMENT_DECLINED", message: "...", correlationId: "..." }
{ code: "APPLE_PAY_UNAVAILABLE", message: "...", correlationId: "..." }
Estimate: AI build ~2h × 2.5x = 5h totalThe Builder loads the project standards document and the PAY-41 tech spec into the AI session. The AI generates the endpoint, state machine, Stripe integration, and tests. Self-verification against acceptance criteria: 3 of 4 pass. The fourth (non-iOS browser behavior) exposes a spec gap: the spec says “fallback to card” but does not specify whether the Apple Pay button is hidden or shown-but-disabled. The Builder logs gap #1 and continues.
A senior reviews the PR against the tech spec. Two rejections: the Stripe call uses a 60-second timeout, not the 30 seconds the spec requires; and there is no structured error response for PAYMENT_DECLINED, just a generic 500. Both are spec violations, not style disagreements.
During the fix, the Builder flags two more gaps: the spec does not define retry behavior on Stripe timeouts, and webhook signature validation is unspecified. Three gaps total. The spec-break threshold. A 30-minute re-spec session patches the spec with retry policy (exponential backoff, 3 attempts, 90s total cap) and webhook HMAC verification. Build continues with the amended spec.
PAY-41 is scored against the defensive checklist. Input validation: present. Error handling: structured with correlation IDs. Idempotency: enforced via the idempotency key. Timeout: fixed to 30 seconds per spec. One gap remains: structured logging lacks a correlation ID on the webhook handler. Fixed in 10 minutes.
Defensive Checklist — PAY-41
Input Validation ✓ Level 2 (schema + boundary checks)
Error Handling ✓ Level 2 (structured, correlation IDs)
Idempotency ✓ Level 3 (key-based, tested)
Timeout / Retry ✓ Level 2 (30s timeout, 3 retries, backoff)
State Management ✓ Level 2 (enum states, transition map)
Logging ✓ Level 2 (structured, correlation ID added)
Auth / Permissions ✓ Level 2 (existing auth middleware)
Test Coverage ✓ Level 2 (happy + sad paths covered)
Score: 21/24 (Level 2+ across all categories)The Lead Architect notes that the Stripe webhook verification pattern is reusable. It goes into the standards document as the canonical webhook integration pattern.
Demo to the product stakeholder: Apple Pay works on iOS Safari. Card fallback on Chrome. No double charges. The stakeholder sees exactly what the acceptance criteria described. No surprises. The spec was the contract.
Retro surfaces a cross-story pattern: timeout behavior was underspecified on 2 of 3 stories this week, not just PAY-41. The team adds a mandatory “External Call Timeouts” section to the spec template. The review findings log is updated. Next week’s specs will be better because this week’s gaps were captured.
The story took 5 days. The code took 2 hours. The other 38 hours were specification, review, hardening, and learning. That ratio is the methodology working, not a sign of overhead. AI makes code cheap. Datum invests the surplus in the work that actually prevents production incidents: getting the spec right, verifying against it, and feeding every lesson back into the system that writes next week’s specs.
Getting Started
The smallest thing a team can do to experience Datum: run one spec-first cycle on one story. Write a brief (problem, acceptance criteria, constraints). Write a lightweight tech spec (contracts touched, boundary schemas). Build from the spec with AI. Review against the spec. One story, start to finish. If the team cannot see the value after one story — if the spec did not prevent at least one mistake the AI would have otherwise made — the model needs adjustment before scaling.
| Week | Focus | Success Signal |
|---|---|---|
| Week 1 | Learn the model. Read the methodology. EM/PO and Lead Arch write 2–3 briefs as a live exercise. Run spec sessions on those briefs. Builders build from the specs. | Every pod member can explain the spec-to-merge cycle without looking at the document. |
| Week 2 | First real sprint at 50–60% capacity. Full Monday-to-Friday cycle with real work. Accept lower throughput. | At least one story ships through the full cycle: spec → build → review → verify → demo. |
| Weeks 3–4 | Full capacity. Run the complete model. Track all metrics. | Spec gap rate is decreasing. Review turnaround is under 4 hours. No stories enter build without a spec. |
| Week 5 | 4-week health check. Five diagnostic questions. | 3+ of 5 health check questions answered positively. If not, diagnose before continuing. |
What to Expect Emotionally
Feels slow and bureaucratic. Writing specs before building feels like overhead when the team is used to jumping straight to code. This is normal. The discomfort is not a signal that the process is wrong — it is the feeling of shifting the bottleneck.
Feels like reduced output. The team ships fewer features than under the old process. This is the investment period. The team is building the muscle memory for specification, not yet seeing the return.
Rework reduction becomes visible. Stories that went through spec sessions have fewer review findings and fewer integration surprises. The Monday investment starts paying off.
The team stops wanting to go back. The evidence is in the review findings log: fewer recurring issues, faster review turnaround, less Thursday firefighting.
Do Not
Migration
Getting Started covers your first cycle. This chapter covers what happens after — when you need to transition a team, convince a manager, or migrate from an existing methodology.
Every team arrives from somewhere. The three most common starting points each have different friction profiles. What maps cleanly, what changes, and where the resistance concentrates.
Sprint planning becomes Monday spec sessions. Sprint review becomes Friday demo. Retrospective becomes Friday retro. The cadence is familiar. What disappears: story points (replaced by the 2.5x multiplier), the Scrum Master role (the Lead Architect absorbs process ownership), daily standup (replaced by async status on the board), and the locked sprint backlog (carryover is explicit, not a failure).
The hard part: Teams expect estimation ceremonies. The 2.5x multiplier feels like guessing to people trained on planning poker. Show them: track estimation accuracy across four weeks. The multiplier calibrates faster than story points ever did because it is based on a single observable ratio, not consensus fiction.
Sprint planning → Monday spec sessions. Sprint review → Friday demo. Retrospective → Friday retro. Weekly cadence → weekly cadence.
No story points. No Scrum Master. No daily standup. Sprint backlog not locked — carryover is explicit, not punished.
The board stays. WIP limits stay (and tighten). Continuous flow is preserved. What changes: a spec gate appears before "Ready." Nothing enters the build column without a brief and tech spec. Pull-based work selection continues, but each item has a verification contract before anyone touches it.
The hard part: Minimal-process teams see specs as bureaucracy — overhead imposed by people who do not write code. Counter: the spec replaces the ad-hoc Slack thread where requirements emerge mid-build. It is not new work. It is the same communication made explicit, written once instead of scattered across twelve messages and a call.
Board → Datum Kanban style. WIP limits → same, tighter. Continuous flow → preserved.
Spec gate before “Ready.” Nothing enters build without a brief and tech spec.
No existing ceremonies map. No existing artifacts carry over. Everything changes. The only viable approach: start with one story, one cycle, exactly as described in Getting Started. Do not attempt a full-team rollout. Do not announce a process change. Run one story through spec → build → verify → retro. Then run another.
The hard part: Every previous process was imposed from above and abandoned within months. The team is inoculated against methodology. Datum must be demonstrated, not mandated. Run one story. Let the evidence argue. If the findings log shows fewer defects and faster delivery on that one story, the team will ask questions. That is the opening.
Nothing.
Everything. Start with one story, one cycle. Expand only after evidence accumulates.
Common Objections
You do not have time not to. The 2.5x multiplier accounts for spec time. Teams that skip specs do not go faster — they move rework from the front of the cycle to the back, where it costs more. The spec is not overhead on top of building. It is the activity that prevents rebuilding.
Review in Datum is verification against a spec, not judgment of competence. The reviewer checks whether the output matches the contract. It is closer to QC inspection than code critique. When the spec is clear, review is fast and impersonal. Resistance comes from ambiguous specs that turn review into a design debate.
Vague input produces vague output. The spec is what makes AI output verifiable. Without a spec, you cannot distinguish correct code from plausible code. AI does not remove the need for specification — it raises the cost of skipping it, because the volume of unverified output increases.
If your system has contracts (APIs, interfaces, data schemas) and state (persistence, user sessions, workflows), Datum applies. Frontend, backend, mobile, infrastructure, data pipelines. The artifacts differ. The cycle does not.
Every methodology transition fails when imposed. Every one succeeds when a small team demonstrates value and others ask to join. Start with one pod. Run one cycle. Let the findings log speak.
Glossary
- Accountability Asymmetry
- Intelligence scales; accountability does not. AI can generate output at infinite volume, but humans still own every outcome. Governance must be structural (documents, checklists, gates) because direct supervision cannot keep pace. See Ch. 13.
- 2.5x Multiplier
- AI code estimate × 2.5 = total effort including hardening, testing, and verification. The 60% that isn’t code generation takes the same time regardless of how the code was produced.
- Builder
- Stage 1 of Datum’s growth path. Builds from specs with AI, self-verifies, flags gaps instead of guessing. See Ch. 10.
- Brief
- Business intent document written by EM/PO. Contains: problem statement, scope, constraints, testable acceptance criteria. The upstream input to the spec session.
- Capacity Trap
- AI acceleration frees hours, but freed hours do not become growth automatically. Without deliberate redeployment into specification, verification, and governance, the capacity evaporates. See Ch. 3.
- Carryover
- Stories not completed in the current week, explicitly moved to next. Distinguished from hidden scope creep by being visible at Monday alignment.
- Datum
- The fixed reference point. In CNC machining, every measurement is relative to the datum. In this methodology, the spec is the datum — the reference for building and reviewing.
- Defensive Checklist
- 8-category quality scorecard (0–3 per category, 24 max). Categories: input validation, error handling, idempotency, timeouts, logging, state management, security, testing.
- Co-Specifier
- Stage 2 of Datum’s growth path. Drafts specs under senior review. The gap between draft and revision shrinks over time. Gate: senior revisions are cosmetic, not structural. See Ch. 10.
- Delivery Style
- The cadence pattern: Sprint (weekly batch), Kanban (pick-and-go), or Continuous (per-story flow). The quality model is invariant across styles.
- EM/PO
- Engineering Manager / Product Owner. Writes briefs with business intent, scope, constraints, and acceptance criteria. In small pods, may be part-time or combined with other responsibilities.
- Fresh-Context Problem
- Every AI session starts from zero. Without a standards document loaded as context, each session invents its own patterns. The standards document solves this.
- Governor
- Stage 4 of Datum’s growth path. Multi-lens review, standards contributions, mentors builders. The shift from individual contributor to force multiplier. See Ch. 10.
- Hardening
- Thursday quality verification. Code is checked against the defensive checklist, acceptance criteria are verified, integration testing runs. Required before Friday demo and release.
- Lead Architect
- The single point of architectural accountability in a pod. Owns the standards document, leads spec sessions for high-risk stories, reviews complex work. Two functions: governance (setting standards) and coaching (growing juniors via the spec revision loop). See Ch. 7.
- Multi-lens Review
- Review approach checking multiple dimensions: correctness, security, performance, maintainability, ops readiness. Not just “does it work.”
- Pod
- A self-contained team of 2–7 people running Datum. Scales by replication (more pods), not growth (bigger pods).
- Shadow AI
- AI agents deployed without formal governance. 75% of knowledge workers use AI tools, often unsanctioned. In Datum terms: builders generating code without a standards document loaded. See Ch. 3.
- Prompt Library
- Shared, curated collection of effective AI agent prompts. Captures what works for the pod. Reviewed and updated at Friday retro.
- Re-spec Session
- 30-minute session triggered by the spec-break rule. Patches the spec before continuing the build from a broken reference.
- Review Findings Log
- Running record of all review findings, spec gaps, and spec-break triggers. Source data for Friday retro. Turns the methodology into a learning system.
- Spec Gap
- An ambiguity, missing detail, or unspecified behavior in a technical specification discovered during implementation. Three or more spec gaps in a single story trigger the spec-break rule.
- Spec Session
- Collaborative design meeting where a brief becomes a technical specification with contracts, state machines, boundary schemas, and acceptance criteria. Held Mondays (sprint) or on-demand (kanban/continuous). See Ch. 5.
- Spec-break Rule
- If a builder discovers 3+ spec gaps in a single story, building stops. A re-spec session is held. The cost of re-speccing is always lower than building from a broken spec.
- Specifier
- Stage 3 of Datum’s growth path. Writes specs independently, leads spec sessions, reviews juniors’ code. Gate: low rework rate on own specs. See Ch. 10.
- Standards Document
- Single source of architectural truth loaded into every AI session. The primary control mechanism for AI output quality.
- Spec Revision Loop
- Datum’s coaching mechanism. A Co-Specifier drafts a spec, the Lead Architect revises it. The delta between draft and final spec is the teaching artifact. See Ch. 10.
- Technical Spec
- Contract-level specification: API contracts, state machines, boundary schemas, test expectations. The artifact the AI builds against and the reviewer verifies against.
- Volume-Risk Spectrum
- Governance intensity scales with risk. Low-risk stories (config changes) need minimal oversight. High-risk stories (payments, auth) need full tech spec, ADR, multi-lens review, and spec-break enforcement. See Ch. 13.
- WIP Limit
- Maximum concurrent PRs per pod. Typically 2 for a 7-person pod, 1 for a 2-person pod. When hit: review before pulling new work.
References
- Crowley, J., Close, K., Munie, K. and Karaca-Griffin, S. The Age of Co-Intelligence: How Humans, AI Agents and Robots Are Redefining Value. Accenture Global Products Practice & Wharton AI and Analytics Initiative, March 2026.
- Boehm, B. Software Engineering Economics. Prentice Hall, 1981. See also Boehm, B. and Basili, V. “Software Defect Reduction Top 10 List.” IEEE Computer 34(1), January 2001.
- Jones, C. Applied Software Measurement. McGraw-Hill, 3rd edition, 2008. Approximately 40% of all defects traced to requirements errors across 12,000+ projects.
- NASA JPL. Software defect cost studies, 1990s–2000s. 70–85% of rework costs traced to requirements errors. Validated Boehm’s defect cost escalation curve.
- IBM Systems Sciences Institute. Relative cost of fixing defects by phase. Cited in Boehm (1981) and widely replicated.
- Faros AI. Developer telemetry study: 10,000+ developers, 1,255 teams. 2025. +98% PRs merged, +21% tasks completed in high-AI-adoption teams.
- Stern, L. (Agoda). Developer productivity data: +91% increase in PR review time under AI-assisted workflows. Summarized in Stiller, E. InfoQ, March 2026.
- Stiller, E. “AI Coding Assistants Haven’t Sped up Delivery Because Coding Was Never the Bottleneck.” InfoQ, March 2026.
- Griffin, L. and Carroll, R. “Spec-Driven Development.” InfoQ, 2025.
- Fuller, J. “Create an Onboarding Plan for AI Agents.” Harvard Business Review, March 2026.
- Brooks, F. The Mythical Man-Month. 1975 (Anniversary Edition 1995). Chapter 4: Conceptual Integrity.
- Brooks, F. “No Silver Bullet: Essence and Accident in Software Engineering.” 1986.
- Singer, R. Shape Up: Stop Running in Circles and Ship Work that Matters. Basecamp, 2019.
- Anderson, D. Kanban: Successful Evolutionary Change for Your Technology Business. Blue Hole Press, 2010.
- Cagan, M. Inspired: How to Create Tech Products Customers Love. Wiley, 2018.