Datum

DATUM

A Software Delivery Methodology for Agentic Powered Teams

In CNC machining, the datum is the fixed reference point — every measurement, every cut, every inspection is relative to it. If the datum is wrong, every part comes out wrong. In software, the spec is the datum.

Datum sprint cycle topology — engineering drawing showing Specify, Build, Verify, Learn phases with datum crosshair, retro loop, and spec-break feedback

Set the reference. Then cut.

First Edition · 2026

The Problem

Scrum, Kanban, XP, Shape Up: all designed for teams where coding was the slowest thing in the room. AI broke that assumption in 2023. The bottleneck moved. The process did not.

+21%

More tasks completed by high-AI-adoption teams

+98%

More pull requests merged

+91%

Increase in PR review time

Faros AI · 10,000+ developers · 1,255 teams · 2025

Before AI

Coding was the bottleneck.

Most engineer-hours went to writing code. Spec took a meeting. Review took a morning. The constraint was typing speed and domain knowledge, so methodologies optimized for build throughput: sprints, velocity, story points.

Coding

65%

Review

20%

Spec

15%

After AI (Unstructured)

Review became the bottleneck.

AI accelerated the build. 98% more PRs landed in the queue. Human review capacity didn’t double. It can’t. So the pipeline jammed at verification, and review became rubber-stamping.

Coding

30%

Review

38%

Waiting

32%

The Bottleneck Shift

Faster coding relocated pressure to the two activities that still require a human brain: specification (what should the system do?) and verification (did it do it correctly?). Teams produce nearly twice the code. Reviewers still have the same 8-hour day. Something has to give, and what gives is rigor.

The CNC Analogy

This shift is not new. Manufacturing went through it in the 1970s. Before CNC machines, a machinist’s value was in their hands: steady cuts, feel for the metal, years of muscle memory. CNC automated the cutting. It didn’t replace machinists. It changed what “machinist” meant.

The Choice

The CNC shops that bolted machines onto manual workflows kept machinists hand-programming G-code from memory instead of from CAD drawings. No standardized toolpaths. No updated inspection protocols. The machines cut faster. They produced more out-of-spec parts faster. Scrap rates went up. QC couldn’t keep pace with 3× the output volume. The bottleneck moved from the spindle to the inspection bench, and nobody moved with it.

The same thing is happening now. AI produces 98% more PRs. Review time increases 91%. Delivery speed: unchanged. The bottleneck moved from coding to verification, and the process did not move with it. Datum moves the process.

CNC Machining

CAD design — what to make

G-code programming — how to make it

Machine setup — load, calibrate

Machine runs autonomously

Operator monitors for deviation

QC inspection against tolerances

Scrap / rework if out of spec

Datum

Brief + tech spec — what to build

Standards doc + prompt — how the AI builds it

Session setup — load standards + spec

AI generates code

Builder self-verifies against criteria

Review against spec — output matches contract?

Spec-break rule — stop, re-spec, resume

CNC Career Path

Stage 1

Machine Operator

Loads programs, runs parts, checks output against prints.

Stage 2

Setup Technician

Writes programs for standard parts. Selects tooling and fixtures.

Stage 3

CNC Programmer

Designs complex toolpaths. Selects cutting strategies for new geometries.

Stage 4

Master Machinist / Foreman

Governs quality. Trains operators. Owns the standards library.

Datum Career Path

Stage 1

Builder

Builds from specs with AI. Self-verifies. Flags gaps instead of guessing.

Stage 2

Co-Specifier

Drafts specs for standard stories. Learns from the gap between draft and revision.

Stage 3

Specifier

Owns specs end-to-end. Leads spec sessions. Reviews juniors’ code.

Stage 4

Governor

Multi-lens review. Standards contributions. Mentors builders. Multiplies the team.

The Axioms

These are the non-negotiable principles the methodology is built on. A team that accepts the axioms can adapt the practices. A team that rejects the axioms should use a different methodology.

A1

Coding Is Not the Bottleneck

Specification and verification are. Organize around them.

A2

Specs Are Design Work

A spec session is a design meeting. Its output — briefs, technical specs, ADRs — is the primary design artifact. Treating it as process friction destroys the methodology.

A3

Vague Spec, Vague Output

AI agents amplify input quality. A precise spec produces precise, verifiable output. A vague spec produces output that passes the happy path and fails everywhere else.

A4

Judgment at Two Points

Upstream: what should the system do? Downstream: does it do it correctly? Everything between those two points can be AI-assisted or AI-generated.

A5

Verify Against the Spec

Reviews that rely on “this doesn’t feel right” don’t scale. Reviews against a written spec are systematic, teachable, and produce reviewers faster.

A6

Governed, Not Freed

A junior with a tight spec and review structure produces more verified output than a senior building from memory. The inversion is real and must be designed for.

The Economics

Datum front-loads effort into specification because defects caught upstream cost orders of magnitude less than defects caught downstream. This is not opinion. It is the most replicated finding in software engineering economics.

60–100×

Cost to fix a defect in production vs. requirements

70–85%

Of rework costs traced to requirements errors (NASA JPL)

$1 → $100

Every $1 in specification saves $10–$100 in failure costs

Boehm’s Defect Cost Curve

Barry Boehm’s research,^[1] validated by IBM Systems Sciences Institute^[2] and NASA JPL^[3] across decades, quantifies the cost escalation. A missing transaction wrapper costs nothing to add to a spec. It costs 15 minutes during coding, 2–4 hours during testing, and days-to-weeks plus thousands of dollars in production.

Requirements / Design

1×

Monday spec sessions. The cheapest place to find a defect. A 30-minute conversation prevents a 30-hour production incident.

Implementation / Coding

5–10×

Tuesday–Wednesday build. The defect is in the code now. Finding it requires reading, testing, reverting. Still cheap relative to production.

Integration / Testing

15–40×

Thursday hardening. The defect interacts with other systems. Finding it requires integration tests, environment debugging, cross-team coordination.

Production

60–100×

Incident response, data repair, customer communication, regulatory reporting, reputation damage. The defect now has a dollar cost attached to it.

The 40/60 Reality

Where delivery effort actually goes, and why AI only accelerates part of it.

60%

not code

40% Feature code (AI-accelerated)

20% Testing & verification

20% Monitoring, retry, resilience

10% Operational readiness

10% Security & integration

When Specification Was Skipped

These are not hypotheticals. These are the dollar costs of missing specs.

HealthCare.gov No shared spec

$93.7M budgeted. $1.7B actual. An 18× cost overrun. Multiple contractors built to different assumptions. The defects were not in the code. They were in the missing contracts between systems.

What a spec would have caught

Shared API contracts, agreed data schemas, and integration test criteria defined before any contractor wrote a line of code. Estimated spec cost: $2M. Estimated savings: $1.6B.

Knight Capital No deployment spec

$440M lost in 45 minutes. Old code was reactivated because the deployment specification did not account for it. The code worked exactly as written. The specification of what should be deployed did not exist.

What a spec would have caught

A deployment checklist with explicit state: which services are active, which are deprecated, what happens on rollback. The spec-break rule would have triggered before the first trade executed.

The AI Amplification Effect

Without specification discipline, AI makes the economics worse, not better.

+98%

More PRs generated by AI-assisted developers (Faros AI, 2025)

+91%

Increase in review time to handle the volume (Agoda, 2026)

0%

Improvement in actual delivery speed (InfoQ, 2026)

AI-generated code that matches a spec is verifiable. The reviewer checks behavior against written criteria. AI-generated code without a spec requires full white-box review, which does not scale. The 98% increase in PRs flows into a review pipeline that is 91% slower, producing no delivery speed gain and introducing quality risk.

Capers Jones’s research across thousands of projects:^[4] poor requirements are the single largest source of project failure, responsible for approximately 40% of all defects. AI does not fix requirements. It amplifies them, for better or worse.

The Capacity Trap

A 2026 joint study by Accenture and the Wharton School^[5] analyzed task-level data across 18 industries covering more than 120 million workers. For a modeled $60 billion company, they estimated $6 billion in potential annual revenue growth from agentic AI at full maturity, alongside $1.7 billion in annual productivity gains.

The catch: roughly two-thirds of those productivity gains materialized as direct cost savings, but the remaining third appeared as cost avoidance: capacity freed for different, higher-value work. Without intentional redeployment, that freed capacity does not become growth. It evaporates.

50%+

Of U.S. working hours subject to reshaping by AI agents

⅓

Of productivity gains that are “freed capacity,” not savings

75%

Of knowledge workers already using AI — often unsanctioned

“Productivity becomes growth only through redeployment,” the report warns. “Unless leaders deliberately redeploy that capacity toward higher-value work, productivity gains stall at efficiency and fail to translate into growth.”

This is the 40/60 reality at enterprise scale. AI accelerates the 40% (code, content, analysis). The 60% (specification, verification, architecture, governance) does not compress. Teams that bank the freed hours as headcount reduction instead of reinvesting them into the 60% get the worst of both outcomes: more AI output, less human oversight, faster accumulation of defects that cost 60–100× to fix in production.

The Bottom Line

The Monday investment in specification is not overhead. It is the cheapest defect prevention the industry has ever measured. Every $1 spent on specification saves $10–$100 in failure costs. No AI model, no testing framework, no deployment pipeline achieves that ROI. And as the Accenture/Wharton data shows, the freed capacity from AI acceleration is only valuable if it flows back into the work AI cannot do: specifying, verifying, and governing.

When to Use It

Before investing in the mechanics, check the fit. Datum is designed for a specific context. If your team matches these four signals, the methodology will produce results within one cycle. If it doesn’t match, save yourself the friction.

✓

AI in the Loop

The team uses AI coding assistants for meaningful portions of code generation — not just autocomplete.

✓

Contracts Exist

The codebase has or is developing defined contracts, APIs, and architectural standards that can be specified.

✓

Mixed Seniority

The team includes seniors who can write specs and juniors who benefit from building against them.

✓

Quality Required

Code quality and production readiness are non-negotiable. The work ships to real users with real consequences.

Context Variants

The full model assumes a 7-person pod with weekly cadence. These three contexts require adaptation, but the core holds.

Solo or Pair · 1–2 Engineers

The spec session becomes a document, not a meeting.

How it works

Before each AI-assisted build, write a lightweight brief: problem statement, acceptance criteria, contracts touched, boundary constraints. Self-review against the brief before merging. Load the standards document into every AI session.

What compresses

Spec sessions → written documents
Peer review → self-review with stricter criteria
Defensive checklist scored on every merge

Exploratory / R&D · Hypothesis-Driven

Replace the brief with a hypothesis brief.

How it works

Problem statement, hypothesis, experiment design, success/failure criteria, and a time-box. “We will know whether approach X is viable by measuring Y within Z hours” is a spec. “Just start coding and see what happens” is vibe coding.

What compresses

Brief → hypothesis brief with time-box
Spec is shorter and more uncertain
But it exists. That’s the difference.

Firefighting / Incident Mode

5-minute incident brief before any fix.

How it works

Symptoms, root cause hypothesis, fix scope, blast radius, rollback plan. The fix is verified against the brief. Full specification and hardening happen in the next regular cycle. Firefighting without specification is how incidents recur.

What compresses

Weekly cadence → 5–10 minute incident brief
Full hardening deferred to next cycle
Post-incident: spec the fix properly

Genuine Poor Fit

Three conditions where Datum is the wrong choice. Not “hard to adopt” — genuinely wrong.

Axiom Rejection

The team believes coding is the bottleneck, or that specification is overhead rather than design work. The practices will be treated as bureaucracy and abandoned within weeks.

The test

Ask: “Would you spend 30 minutes specifying a story to save 4 hours of rework?” If the answer is no, fix the belief first.

No Customer Access

The customer cannot participate in specification. Very rare, and usually a relationship problem. But if it cannot be resolved, the upstream specification chain is broken.

The test

Can someone write acceptance criteria that the customer would agree with? If yes, that person is the proxy PO. If no, the chain is broken.

No AI in Use

The methodology is designed for the agentic context. Without AI leverage, the bottleneck assumptions do not hold and traditional methodologies are better calibrated.

The test

Is AI generating >20% of your code? If not, the 2.5× multiplier and the review-volume economics don’t apply. Use Shape Up or Kanban instead.

What Datum Is Not

For the skeptic in the room.

Not Waterfall

Specs live in git, get reviewed like code, and can change. We front-load clarity, not commitment. Iteration continues once implementation starts.

Not Replacing Your Tools

Jira, Linear, Shortcut: whatever you use for tracking stays. Datum changes the output of planning, not the workflow management layer.

Not Bureaucracy

The refinement ceremony stays. What changes is the output: a structured spec alongside or instead of a traditional story description.

Not For Every Task

Bug fixes, config changes, and simple tasks may stay lightweight. The spec flow applies to features complex enough to benefit from it.

Not Anti-Agile

Sprint ceremonies, iterative delivery, and continuous feedback remain. Datum restructures what happens inside the sprint, not around it.

How It Works

Datum runs on a weekly cadence. The cadence is a rhythm, not a commitment. Stories carry over when they must. The carryover is explicit and communicated.

Day	Primary Activity	Deliverable
Monday	Alignment + collaborative spec sessions	Every story has a brief, tech spec, and estimate before anyone builds
Tue–Wed	AI-assisted build + continuous review	Code merged only after verification against spec
Thursday	Integration verification + hardening	Everything passes the defensive checklist at level 2+
Friday	Demo + retro + next week prep	Stakeholders have seen the work; next Monday's briefs are ready

graph LR
  subgraph Monday["Monday: Specify"]
    direction TB
    M1["Capacity check"]
    M2["Spec sessions"]
    M3["Estimates ready"]
    M1 --> M2 --> M3
  end

  subgraph TueWed["Tue-Wed: Build"]
    direction TB
    T1["AI-assisted build"]
    T2["Self-verify vs spec"]
    T3["PR submitted"]
    T4["Continuous review"]
    T1 --> T2 --> T3 --> T4
  end

  subgraph Thursday["Thursday: Harden"]
    direction TB
    H1["Integration test"]
    H2["Defensive checklist"]
    H3["Standards update"]
    H1 --> H2 --> H3
  end

  subgraph Friday["Friday: Ship"]
    direction TB
    F1["Demo to stakeholders"]
    F2["Retro"]
    F3["Prep next briefs"]
    F1 --> F2 --> F3
  end

  Monday --> TueWed --> Thursday --> Friday

  SB["Spec-Break"]
  T2 -.->|"3+ gaps"| SB
  SB -.->|"re-spec"| M2

  style Monday fill:#D6DFE8,stroke:#4A6B8A,color:#2A2A26
  style TueWed fill:#F0E8D0,stroke:#B8860B,color:#2A2A26
  style Thursday fill:#D8E4D8,stroke:#5B7A5B,color:#2A2A26
  style Friday fill:#E4DCE8,stroke:#6B5B7B,color:#2A2A26
  style SB fill:#E8D0D0,stroke:#8B3A3A,color:#2A2A26

Weekly time allocation by activity

25%

30%

25%

15%

Specification (25%)

AI-assisted build (30%)

Verification (25%)

Alignment (15%)

Learning (5%)

1 Spec Before Build

No story enters the build queue without a brief (business intent, acceptance criteria) and a technical spec (contracts, state machines, boundary schemas). This is not a best practice. It is the load-bearing rule. Removing it collapses the methodology back to vibe coding.

Spec sessions are collaborative design conversations — 30 to 60 minutes per story — involving everyone who will build, verify, and accept the work. The conversation is the design work. The document is the record. Specs are not written by one person and handed to another.

2 Continuous Review with WIP Limits

Builders may have at most 2 open PRs awaiting review at any time. AI-assisted builders produce code faster than humans can review it. Without a WIP limit, “continuous review” becomes “batched review on Thursday”: Scrum with extra steps.

2

max open PRs

4h

review turnaround

3 The Spec-Break Rule

If a builder discovers 3 or more spec gaps in a single story, building stops. The builder, reviewer, and spec author hold a 30-minute re-spec session before more code is generated. Building from a broken spec compounds waste. The re-spec cost is always lower than the rework cost.

3+

gaps = stop

30m

re-spec session

4 Estimation by Spec Complexity

Estimates are produced during spec sessions, not in separate planning ceremonies. Four dimensions drive the estimate:

1. Contracts and interfaces touched

2. State machine transitions

3. External dependencies

4. Pattern novelty

The formula

AI code estimate × 2.5 = total effort

AI accelerates the 40% (code). The 60% (testing, monitoring, resilience) takes the same time.

5 Standards as AI Governance

Every AI session loads the standards document as context. This solves the “fresh-context problem”: without shared context, each AI session invents its own patterns, producing inconsistency across the codebase. The standards document is the primary control mechanism for AI output quality — more effective than prompt tuning or post-hoc review alone.

CHAPTER SIX

Delivery Styles

The previous chapter describes the weekly cadence — Monday spec, Tuesday build, Thursday harden, Friday retro. That is one delivery style. Datum’s core principles (spec before build, review against spec, standards as AI governance) are delivery-agnostic. They work in at least three distinct rhythms, each suited to different team shapes and business contexts.

Sprint

Batch specs Monday. Build all week. Ship Friday.

The rhythm

Weekly cadence with named days: Monday alignment + spec sessions, Tuesday–Wednesday build, Thursday hardening, Friday demo + retro. Stories are batched into a weekly scope. Carryover is explicit.

Best for

Product teams with stakeholders who expect weekly visibility
Teams new to Datum — the structure makes habits visible
Regulated environments that need ceremony documentation

Feels like

Shape Up meets Scrum. Fixed time, variable scope. The spec session replaces sprint planning. The retro replaces the retrospective. There is no separate standup — Monday alignment covers the week.

Spec

→

Build

→

Harden

→

Ship

Weekly batch · Mon–Fri

Kanban

Board of pre-written specs. Pick and go.

The rhythm

No fixed cadence. Specs are written continuously by the EM/PO and architects and placed in a “Ready” column. Engineers pull the next spec when they finish their current work. WIP limit of 1 per engineer. Review is continuous; every PR is reviewed before the next spec is pulled.

The board

Backlog	Spec Ready	Building	In Review	Done
IdeaIdea	PAY-41USR-18INV-09	PAY-40	USR-17	PAY-39

Engineers never enter the Backlog column. They pick from “Spec Ready”: specs with a brief, tech spec, and acceptance criteria. The board makes WIP visible. If “In Review” is full, review before pulling new work.

Best for

Mature teams that have internalized spec discipline
Support/ops teams with unpredictable inflow
Teams where the PO writes specs ahead of the build capacity

Feels like

Classic Kanban with one critical addition: nothing enters “Spec Ready” without a complete brief + tech spec. The board is the backlog, the spec, and the status tracker. No ceremonies except a weekly retro (30 min) to review the findings log and tune the process.

Pick

→

Build

→

Review

→

Merge

Continuous · WIP = 1

Continuous

Spec → build → ship in one flow. Multiple times a day.

The rhythm

Each story is specced, built, reviewed, and shipped as a single atomic flow. No batching. No waiting for Thursday to harden. The defensive checklist is a CI gate, not a human ceremony. Each spec covers one concern: one endpoint, one state machine, one contract change. The 2.5× multiplier is baked into the estimate, not into a separate hardening day.

Best for

High-trust teams with strong CI/CD pipelines
Infrastructure and platform teams
Teams where the Lead Architect is also building

Feels like

Trunk-based development with spec gates. Every merge is a mini-release. The spec is a PR description that follows a template, not a separate document. Review is synchronous; the reviewer is tagged at PR time, not at end-of-day. Friday retro still happens, but it’s the only ceremony.

Spec

→

Build

→

CI Gate

→

Deploy

Per-story · Multiple/day

The Team

Datum is designed for a 7-person pod. Five personas, each with a distinct function. The composition is deliberate: enough senior capacity to write specs and review, enough junior capacity to leverage AI for high-volume building, and a single point of architectural accountability.

1

The PO

EM / Product Owner

“What does the customer need and how will we know we built it?”

Writes briefs with testable acceptance criteria
Owns stakeholder alignment
Monday spec sessions: brings business intent
Friday demos: presents to stakeholders

1

The Architect

Lead Architect

“The vision codified, not carried in my head.”

Owns the standards document
Writes ADRs
Multi-lens review on complex changes
Scores defensive checklist weekly
Brooks’s “single architect”

1

The Specifier

Architect / Very Senior

“Contracts, state machines, boundary schemas. The datum.”

Translates briefs into technical specs
Defines contracts and state machines
Builds cross-cutting concerns
Unblocks when others hit ambiguity

2

The Reviewer

Senior Engineer × 2

“Less code, more value. My specs and reviews prevent rework.”

Co-writes specs with architects
Reviews junior output against the spec
AI-assisted building on complex stories
Mentors juniors toward spec-writing

2

The Builder

Junior Engineer × 2

“Highest AI leverage. Tight specs, loaded standards, full speed.”

Builds from specs with AI assistance
Self-verifies against acceptance criteria
Flags spec gaps — never guesses
More code volume, under tighter governance

Pod composition — 7 people, 5 roles

EM/PO

Lead Arch

Arch

Senior × 2

Junior × 2

Specification upstream

Conceptual integrity

Technical specification

Grey-box engineering

AI-assisted building

CHAPTER EIGHT

RACI Matrix

Responsibility assignment across pod roles for every major activity.

Specification

Activity	EM/PO	Lead Arch	Arch	Senior	Junior
Write brief	R/A	C	—	C	I
Write technical spec	C	C	R/A	R	C
Lead spec session (high-risk)	R	A	R	C	C
Lead spec session (standard)	C	I	C	R/A	C
Write ADR	I	A	R	C	—
Estimate story	C	C	R	R/A	C

Build

Activity	EM/PO	Lead Arch	Arch	Senior	Junior
AI-assisted build from spec	—	—	R	R	R
Self-verify vs acceptance criteria	—	—	R	R	R/A
Flag spec gaps during build	—	I	I	I	R
Trigger spec-break (3+ gaps)	—	I	C	A	R
Rapid re-spec session	—	I	R/A	R	C

Review

Activity	EM/PO	Lead Arch	Arch	Senior	Junior
Review junior PRs against spec	—	—	—	R/A	—
Review complex PRs	—	I	R/A	—	—
Multi-lens review	—	R/A	C	—	—
Spot-check vs brief	R/A	—	—	—	—
Peer review (junior→junior)	—	—	—	C	R

Governance

Activity	EM/PO	Lead Arch	Arch	Senior	Junior
Own standards document	—	R/A	C	C	I
Update standards (new patterns)	—	A	R	C	—
Score defensive checklist	—	R/A	R	—	—
Maintain prompt library	C	C	C	R	R
Run Friday retro	C	A	C	R	R
Agent governance	—	R/A	C	C	I

Operational

Activity	EM/PO	Lead Arch	Arch	Senior	Junior
On-call (primary)	—	—	R	R	—
On-call (shadow)	—	—	—	—	R
On-call (escalation)	—	R/A	—	—	—
Incident communication	R/A	C	—	—	—
Post-mortem facilitation	I	R/A	C	C	I
Thursday integration verify	—	R	R/A	R	—
Thursday hardening	—	—	—	R/A	R

Growth

Activity	EM/PO	Lead Arch	Arch	Senior	Junior
Assess junior gate criteria	—	A	—	R	—
Mentor junior (Builder stage)	—	—	—	R/A	—
Write promotion case	—	C	—	R	—
Diagnose stalled engineer	—	A	C	R	—

Smaller Teams — Role Mapping

7-Person Role	Team of 3	Team of 2
EM/PO	Person A (part-time)	Shared
Lead Architect	Person A	Engineer A
Architect	Person B	Engineer A
Senior Engineer	Person B	Engineer A
Junior Engineer	Person C	Engineer B

All RACI assignments for collapsed roles merge onto the absorbing person. Where one person holds both R and A, compensate with stricter acceptance criteria and written self-review.

CHAPTER NINE

Team Sizing

The 7-person pod is the full model. Not every team starts there. The axioms hold at any size. The practices compress.

How the pod compresses

7-PERSON POD

EM/PO

Lead Arch

Arch

Senior ×2

Junior ×2

TEAM OF 3

A: Spec

B: Build+Review

C: Build

TEAM OF 2

A: Spec+Review

B: Build

Team of 2

Two engineers, no dedicated PO or architect.

The people

Engineer A (more senior): Lead Arch + Spec Author + Reviewer. Writes briefs and tech specs. Reviews all of B’s output against the spec. Owns the standards document.

Engineer B (more junior or equal): Builder + Co-Specifier. Builds from specs with AI. Self-verifies before submitting. Drafts specs as they grow.

What compresses

Spec sessions: 15–30 min conversation, not a room
WIP limit drops to 1 PR (one reviewer)
Standards doc: single page, not comprehensive
Thursday hardening: 2-hour block, not full day
PO function: shared between both

Team of 3

One spec owner, one build+review, one builder.

The people

Person A: PO/EM or spec-heavy senior. Writes briefs, leads spec sessions, reviews complex work.

Person B: Senior engineer. Writes tech specs, builds, reviews C’s output.

Person C: Engineer (builder). Builds from specs, self-verifies, flags gaps.

The key constraint

Someone must own brief quality. If no one does, the team drifts back to vibe coding. Person A can be a part-time PO who also codes, or a full-time engineer who owns specification.

Team of 4–5

One dedicated spec author, one dedicated reviewer, 2–3 builders.

The people

Person A: PO/EM or Lead Architect. Briefs, standards, architectural decisions.

Person B: Senior. Tech specs, builds, reviews C and D’s output.

Person C–D: Engineers. Build from specs, peer-review each other, flag gaps.

Person E (if 5): Junior. Builds under supervision.

What compresses

No separate Architect role. Lead Arch/Senior absorbs it
Review concentrated on one person — mitigate with stricter WIP limits
At 5 people, one hire from the full 7-person pod

Growing Engineers

Datum requires a structured growth path because the methodology's leverage depends on juniors who can eventually write specs, not just build from them. The risk without an explicit path: permanent junior executors who produce high-volume, low-accountability output indefinitely.

The four stages are performance-gated, not time-gated. Transitions happen when the engineer demonstrates the gate criteria, not when a calendar threshold passes.

Stage 1

Builder

Execute from specs

Builds from specs with AI. Participates in spec sessions to learn what specs prevent.

Spec: 10% · Build: 80%
Verify: 20% · Govern: 0%

Gate: Builds pass first review. Explains why the spec requires what it requires. Flags gaps instead of guessing.

Stage 2

Co-Specifier

Draft specs under review

Writes first-draft specs, reviewed by a senior. The gap between draft and revision shrinks over time.

Spec: 40% · Build: 40%
Verify: 25% · Govern: 5%

Gate: Senior revisions are cosmetic, not structural. Explains trade-offs in spec decisions when challenged.

Stage 3

Specifier

Own the spec end-to-end

Writes specs independently. Leads spec sessions. Reviews juniors’ code. Their specs are the acceptance authority.

Spec: 40% · Build: 20%
Verify: 30% · Govern: 10%

Gate: Low rework rate on own specs. Identifies cross-service implications without prompting.

Stage 4

Governor

Multiply the team

Multi-lens review. Standards contributions. Mentors new juniors. The shift from IC to force multiplier.

Spec: 30% · Build: 5%
Verify: 35% · Govern: 30%

Gate: Governs others’ output consistently. Runs spec sessions for high-risk stories end-to-end.

The inversion — how effort shifts across stages

Builder

10%

70%

15%

Co-Specifier

30%

40%

25%

Specifier

40%

20%

30%

10%

Governor

30%

35%

30%

Specification

Building

Verification

Governance

Skills Are the Currency, Not Titles

A 2026 Accenture/Wharton study developed the WAsX (Wharton–Accenture Skills Index) to measure how skills translate into economic value in an AI-enabled economy. The finding: as AI automates routine cognitive work, the market increasingly rewards judgment, coordination, and domain-specific execution. Exactly what Datum’s growth path develops.

Declining Premium

Routine cognitive skills. The work AI does.

Code generation from requirements
Data formatting and transformation
Template-based documentation
Standard test case creation

These are Builder-stage skills. They are necessary but no longer scarce. AI performs them at scale, and the market prices them accordingly.

Rising Premium

Judgment, coordination, governance. The work AI cannot do.

Specification: translating ambiguity into contracts
Verification: evaluating output against intent
Architecture: making trade-offs with incomplete information
Governance: maintaining system coherence at scale

These are Specifier-and-above skills. The WAsX data shows the market assigns increasing monetary value to capabilities that complement AI rather than compete with it.

The study also found a persistent signaling gap: workers overwhelmingly signal broad, generalist traits, while employers pay for specialized, execution-oriented capabilities. In Datum terms: calling yourself a “senior engineer” signals nothing. Demonstrating that your specs produce low-rework implementations and your reviews catch drift. That is what the market pays for.

This is why the growth path is performance-gated, not time-gated. Gate criteria like “senior revisions are cosmetic, not structural” (Co-Specifiers) and “low rework rate on own specs” (Specifiers) measure skills that carry economic value, not years spent.

The Spec Revision Is the Coaching Session

Accenture/Wharton studied the organizations they call “Talent Reinventors.” Leaders were 1.3× more likely to delegate and coach, even when it slowed execution. These organizations grew revenue 1.8 percentage points faster, strengthened culture (7× more likely), and increased adaptability (4× more likely).

In Datum, this coaching is the spec revision loop.

How It Works

A Co-Specifier drafts a spec. The Lead Architect revises it. That revision is not rework. It is the most direct way to improve the junior’s judgment.

The delta between draft and final spec shows exactly where the junior’s judgment fell short. Not abstract feedback. Not “think more carefully.” Just: you wrote X, the spec needs Y, here’s why.

The Trade-Off

The architect spends time revising a spec the junior wrote poorly instead of writing it correctly themselves in half the time. This feels like a productivity loss. It is a resilience investment.

Why It Pays Off

A pod that never promotes Builders to Co-Specifiers has a single point of failure at the architect. A pod that coaches has a pipeline. When the architect is overloaded or gone, a Co-Specifier steps in at reduced quality rather than no quality.

Connection

This is the structural fix for the Review Death Spiral (Ch. 14). One of its root causes is an architect with no bench depth. Coaching builds that bench.

Governor, Not Just Gatekeeper

The Lead Architect has two jobs. Governance: setting the standards document, owning consistency, reviewing against specs. Coaching: letting juniors attempt work above their level, then using the gap as feedback. Organizations that invest in both build stronger benches, lower failure risk, and grow engineer judgment faster because the feedback loop is embedded in real work.

Talent Reinventors data: Accenture & Wharton School, The Age of Co-Intelligence, March 2026.

CHAPTER ELEVEN

The Artifacts

Documentation is infrastructure. Not a nice-to-have, not a chore for after the sprint. Infrastructure, the same way a CI pipeline is infrastructure. Every AI session loads these documents. Every review verifies against them. Every retro improves them. When the documentation is wrong, every AI session produces wrong output at scale. When it is precise, every session inherits the team’s accumulated decisions.

The methodology produces seven named artifacts, each with a clear owner and update cadence. Each artifact is a control surface: one person’s decisions become another’s constraints. Click any artifact to see its structure, an example, and the failure mode it prevents.

Lead Architect

Standards Document

Single source of architectural truth. Loaded into every AI session.

▾ See example

Purpose

The standards document is the primary governance mechanism for AI output quality. Every AI coding session loads it as context, solving the “fresh-context problem”: without shared context, each session invents its own patterns. The standards document makes architectural decisions portable and enforceable without the Lead Architect being present.

Structure

# Standards Document — [Project Name]

## Architecture
- Three-tier: presentation → logic → data
- Logic tier has zero I/O knowledge
- All external calls go through client wrappers

## Naming
- Services: PascalCase (UserService)
- Endpoints: kebab-case (/user-profile)
- Database tables: snake_case (user_profile)

## Error Contract
All errors return: { code, message, correlationId }

## Non-Negotiable Rules
- No raw SQL — use query builder
- No silent catch — log and re-raise
- No business logic in controllers

Update cadence

Updated by the Lead Architect when Friday retros surface gaps, or when new patterns emerge during Thursday hardening. Reviewed at Monday alignment.

What breaks without it

Without a standards document, a 5-person pod using AI generates code in 5 different styles. Error handling is inconsistent. Naming conventions drift. Each PR review becomes a style debate. The Lead Architect becomes a bottleneck because their knowledge is in their head, not in a document the AI can read.

Lead Arch + Arch

Architecture Decision Records

Context, decision, and consequences for every architectural choice.

▾ See example

Purpose

ADRs capture the why: the context at the time, the options considered, and the trade-offs accepted. Six months later, when someone asks “why did we use message queues instead of direct API calls?”, the ADR answers without needing the original architect in the room.

Structure

# ADR-007: Use event-driven architecture for billing

## Status: Accepted (2026-03-15)

## Context
Billing calculations depend on data from 3 services.
Synchronous calls create a cascade failure risk:
if Inventory is down, Billing cannot process orders.

## Decision
We will use an event bus (RabbitMQ) for inter-service
billing communication.

## Consequences
+ Services are decoupled — Billing processes events
  when Inventory recovers
+ Easier to add new billing triggers
- Eventual consistency — billing may lag by seconds
- Need dead-letter queue for failed events

Update cadence

Created during spec sessions when architectural decisions are made. Drafted by the Architect, reviewed by the Lead Architect. Updated when decisions are revisited or superseded.

What breaks without it

Without ADRs, teams relitigate the same decisions every quarter. New team members reverse architectural choices because they don’t know why they were made. The standards document says what to do. ADRs provide the why, preventing refactors from undoing deliberate trade-offs.

EM / PO

Briefs

Business intent, scope, constraints, and testable acceptance criteria per story.

▾ See example

Purpose

The brief is the upstream input that determines everything downstream. It translates stakeholder needs into a form engineers can spec against. A brief that says “improve the checkout flow” produces vague specs and vague AI output. A brief that says “reduce checkout abandonment at the payment step by adding Apple Pay, constrained to the existing Stripe integration” produces a spec that an AI can build from.

Structure

## Brief: Add Apple Pay to Checkout

**Business intent**: 23% of mobile users abandon at
payment entry. Apple Pay eliminates manual card input.

**Scope**: Payment step only. No changes to cart,
shipping, or order confirmation.

**Constraints**:
- Must use existing Stripe integration
- iOS Safari and Chrome on iOS only
- Fallback to card entry if Apple Pay unavailable

**Acceptance criteria**:
- [ ] Apple Pay button appears on iOS Safari
- [ ] Successful payment creates order in Stripe
- [ ] Non-iOS browsers see no change
- [ ] Failed Apple Pay falls back to card form

Precision upgrade: EARS notation

For teams that want machine-parsable requirements, use EARS (Easy Approach to Requirements Syntax). Each requirement follows a pattern that an AI can extract, test against, and verify automatically:

WHEN a user submits a payment with Apple Pay
THE SYSTEM SHALL create a Stripe payment intent
  with idempotency key derived from order ID
SO THAT duplicate submissions do not produce
  double charges

WHILE the payment intent status is "processing"
THE SYSTEM SHALL return the existing payment ID
  on subsequent requests for the same order
SO THAT concurrent requests are handled safely

IF the Stripe API does not respond within 30 seconds
THEN THE SYSTEM SHALL return STRIPE_UNAVAILABLE
  and log the timeout with correlation ID
SO THAT the failure is visible and the user can retry

EARS is a precision upgrade, not a replacement for the standard brief format. Use it when requirements must be unambiguous enough for an AI to generate test cases directly from the spec.

Update cadence

Written by EM/PO before Monday spec sessions. Pre-drafted on Fridays using stakeholder feedback from demos. Refined during the spec session based on technical constraints surfaced by the architect.

What breaks without it

Without briefs, engineers spec against their assumptions about what the business wants. The AI builds what the spec says, which is what the engineer assumed, which may not be what the customer needs. The gap is invisible until demo day. Briefs force the business intent to be explicit before any code is generated.

Arch / Seniors

Technical Specs

Contracts, state machines, boundary schemas, test expectations per story.

▾ See example

Purpose

The technical spec is the datum — the fixed reference point that the AI builds against and the reviewer verifies against. It defines the contracts (what goes in, what comes out), the state transitions (what’s legal, what’s not), and the boundary schemas (what external data looks like). The spec is what makes AI output verifiable instead of vibes-based.

Structure

## Spec: Apple Pay Payment Handler

**Contracts**:
  POST /api/payments/apple-pay
  Request: { token: string, orderId: string }
  Response: { paymentId, status, receiptUrl }
  Errors: INVALID_TOKEN, ORDER_NOT_FOUND,
          PAYMENT_DECLINED, STRIPE_UNAVAILABLE

**State machine**:
  idle → validating → processing → completed
  idle → validating → failed
  processing → failed (timeout after 30s)

**Boundary schema** (Stripe webhook):
  { type: "payment_intent.succeeded",
    data: { object: { id, amount, metadata } } }

**Test expectations**:
  - Expired token → INVALID_TOKEN, no Stripe call
  - Stripe timeout → STRIPE_UNAVAILABLE after 30s
  - Duplicate submission → idempotent (same paymentId)

Update cadence

Created during Monday spec sessions. Updated if builders discover spec gaps during Tuesday/Wednesday build (via the spec-break rule: 3+ gaps = stop and re-spec). Finalized during Thursday hardening.

What breaks without it

Without tech specs, reviews become subjective. The reviewer checks whether the code “looks right” rather than whether it matches a defined contract. Edge cases are discovered in production, not in spec sessions. The AI generates plausible code with no verifiable reference point.

Lead Architect

Defensive Checklist Scores

Weekly quality scorecard across 8 categories. Tracks hardening over time.

▾ See example

Purpose

The defensive checklist converts “code quality” from an opinion into a measurable score. Each category (input validation, error handling, idempotency, timeouts, logging, state management, security, testing) is scored 0–3. The scores trend over time, making hardening progress visible to the team and to stakeholders.

Structure

## Defensive Checklist — Week of 2026-03-24

| Category         | Score | Notes                    |
|------------------|-------|--------------------------|
| Input validation |   3   | All boundaries covered   |
| Error handling   |   2   | 2 bare catches remaining |
| Idempotency      |   2   | Payment endpoint done    |
| Timeouts         |   1   | 4 calls missing timeout  |
| Logging          |   3   | Structured, correlation  |
| State mgmt       |   2   | Order FSM complete       |
| Security         |   2   | Rate limiting pending    |
| Testing          |   2   | 78% logic coverage       |

**Overall: 17/24 (Level 2)**
Target: Level 2+ (16/24) by Thursday ✓

Update cadence

Scored by the Lead Architect every Thursday during hardening. Reviewed at Friday retro. The trend line (not the absolute score) is the metric that matters.

What breaks without it

Without the checklist, “hardening” is undefined. The team ships when they feel done, not when they’ve met a standard. Technical debt accumulates invisibly. Stakeholders cannot assess production readiness because there is no shared definition of what “ready” means.

Full Pod

Prompt Library

Effective AI agent prompts. The pod’s institutional memory for agentic work.

▾ See example

Purpose

The prompt library captures what works. When a senior discovers that a specific prompt structure produces better test coverage, or that loading the standards document in a particular order reduces hallucination, that knowledge belongs to the team — not to one person’s clipboard. The library is the pod’s institutional memory for agentic work.

Structure

## Prompt: Build from Spec (Standard)

**When to use**: Building any story from a tech spec
**Load order**:
  1. Standards document (full)
  2. Technical spec for this story
  3. Relevant existing code (interfaces only)

**Prompt template**:
  "Implement [story] per the attached spec.
   Follow the standards document for all patterns.
   Write tests before implementation.
   Flag any spec gaps — do not guess."

**What it prevents**: AI inventing patterns not
in the standards. Building without tests. Silently
filling spec gaps with assumptions.

**Contributed by**: Senior A (2026-03-10)
**Validated by**: 12 stories, 0 spec-gap misses

Update cadence

Contributed by anyone in the pod when they find an effective prompt pattern. Reviewed at Friday retro. Pruned quarterly — prompts that haven’t been used in 6 weeks are archived, not deleted.

What breaks without it

Without a prompt library, each team member discovers effective prompts independently. The senior who leaves takes their prompt knowledge with them. Juniors struggle with AI tools because nobody shared what works. The team never compounds its agentic skills.

Arch / Seniors

Review Findings Log

All review findings, spec gaps, and spec-break triggers. Source of retro data.

▾ See example

Purpose

The review findings log is the feedback loop that improves specification quality over time. Every review finding, spec gap, and spec-break trigger is recorded with enough context to spot patterns. If the same kind of gap appears three weeks in a row, the spec process has a hole — and the log makes that visible before it becomes a production incident.

Structure

## Review Findings — Week of 2026-03-24

| Story   | Finding              | Category   | Root Cause    |
|---------|----------------------|------------|---------------|
| PAY-41  | Missing timeout on   | Spec gap   | Spec didn't   |
|         | Stripe call          |            | cover timeouts|
| PAY-42  | Error swallowed in   | Code issue | Standards doc |
|         | webhook handler      |            | was not loaded|
| USR-18  | Spec-break triggered | Spec gap   | Auth edge     |
|         | (4 gaps found)       |            | cases missing |

## Patterns This Week
- 2 of 3 gaps were timeout-related → add timeout
  section to spec template
- 1 spec-break from auth complexity → flag auth
  stories for Lead Arch spec session

Update cadence

Updated continuously during Tuesday/Wednesday reviews. Patterns section written Thursday. Discussed at Friday retro to drive spec template improvements for next week.

What breaks without it

Without the findings log, retros are opinion-based: “I feel like our specs are getting better.” With it, retros are data-driven: “Timeout-related spec gaps dropped from 4/week to 0/week after we added the timeout section to the template.” The log turns the methodology into a learning system instead of a static process.

Where Artifacts Live

Artifacts are version-controlled, not scattered across wikis and Slack threads. They live in git, get reviewed like code, and have blame history. A concrete convention:

Per-Story Specs

specs/
  PAY-41/
    brief.md          # PO-owned
    spec.md           # Architect-owned
    tasks.md          # Builder-owned
  USR-18/
    brief.md
    spec.md
    tasks.md

Each story gets a directory named by its ticket ID. The brief, spec, and task breakdown are co-located. Spec PRs are reviewed and merged before implementation PRs.

Shared Governance

docs/
  standards.md        # Lead Architect
  adr/
    001-event-bus.md  # Architect + Lead
    002-auth-pattern.md
  checklist/
    week-2026-03-24.md
  prompts/
    build-from-spec.md
    review-against-spec.md
  findings/
    week-2026-03-24.md

Governance artifacts live in docs/. The standards document is the root. ADRs, checklists, prompts, and findings are subdirectories with their own cadence.

Documentation Is Infrastructure

No single artifact is sufficient. The standards document without ADRs loses its rationale. Briefs without tech specs produce vague AI output. Tech specs without review findings never improve. The seven artifacts form a closed loop that lives in git, not in a wiki. Treat these documents the way you treat a CI pipeline: they run on every build, they gate every merge, and when they break, the system breaks.

Metrics

Datum produces specific, measurable signals. Not vanity metrics — diagnostic indicators that reveal whether the methodology is working or decaying. Track these five. Ignore everything else.

↓ 0

Spec Gap Rate

Gaps found during build ÷ stories built. Target: trending toward zero.

Source: review findings log, measured per story

< 4h

Review Turnaround

PR open to review complete. Target: under 4 hours.

Exceeding triggers reallocation, not escalation

16+

Checklist Score

Weekly 0–24 (8 × 0–3). Target: Level 2+ (16/24).

Trend matters more than absolute score

< 20%

Carryover Rate

Stories carried to next week ÷ stories planned.

If consistently high, Monday capacity check is dishonest

< 30%

Rework Rate

PRs requiring revision after review ÷ total PRs.

Cross-reference with spec gap rate to distinguish spec vs build problems

What Good Looks Like

Expected metric ranges across the first 4-week adoption cycle.

Week 1

Spec gaps: 0.8–1.0
Review: 6–8 hours
Checklist: 8–12
Carryover: 30–50%
Rework: 60–80%

Everything is high. This is normal. You are calibrating.

Week 2

Spec gaps: 0.5–0.7
Review: 4–6 hours
Checklist: 12–16
Carryover: 20–35%
Rework: 40–60%

Specs are improving. The team finds its rhythm.

Week 3

Spec gaps: 0.3–0.5
Review: 2–4 hours
Checklist: 14–18
Carryover: 15–25%
Rework: 30–45%

Pattern recognition forming. Reviews become routine.

Week 4+

Spec gaps: < 0.3
Review: < 4 hours
Checklist: 16–24
Carryover: < 20%
Rework: < 30%

Sustained. The team stops wanting to go back.

Metric Traps

Metrics that look healthy in isolation can mask dysfunction when read together.

Rubber-Stamping A4

Green turnaround + high rework. Reviews are fast because they are shallow. PRs pass review and fail in integration. The turnaround metric is green. The quality is not.

Detection

Turnaround under 2h while rework stays above 40%. Findings per review declining while post-merge defects rise.

Absorbing Ambiguity A2

Low spec gaps + high rework. Builders guess instead of flagging. Spec gap rate is low because gaps go unreported, not because they don’t exist. The spec-break rule never fires.

Detection

Spec gaps below 0.2 while rework above 40%. Review findings reference requirements not in the spec. Builders never invoke spec-break.

Checklist Theater A5

High scores + production incidents. Checklist scored to pass, not to verify. “We have tests” vs. “our tests cover the failure modes in the spec.”

Detection

Scores 18+ while incidents occur in categories scored Level 2+. Post-mortem root causes map to “passing” categories.

Agent Governance

Every AI coding session is a new employee who has never seen your codebase, has no memory of yesterday’s decisions, and will confidently generate plausible-looking code that violates your architecture. The standards document is the onboarding packet. Agent governance is the HR policy.

Adapted from Fuller, “Create an Onboarding Plan for AI Agents,” Harvard Business Review, March 2026.

R

Role Definition

What the agent is

“Build from the spec. Follow the standards. Flag gaps — don’t guess.”

The role is narrow by design
The agent builds
It does not design, review, or decide

Owner: Lead Architect via the standards document

S

Scope of Action

Bounded by the spec

The spec names contracts, state machine, schemas. Anything outside is out of scope.

An employee who redesigns the org chart on day one has exceeded their scope
So has an AI that generates auth nobody asked for

Owner: Spec author via the tech spec per story

A

Accountability

Structural, not personal

Review against the spec, not intuition. The checklist scores output. The log tracks failures.

Accountability is structural
The agent has no feelings to hurt
Verify behavior, not vibes

Owner: Reviewer via review against spec

F

Feedback Loops

The system learns

The agent doesn’t learn. But the system around it does.

Standards updated when patterns emerge
Prompt library captures what works
Findings log drives upstream fixes

Owner: Full pod via prompt library + findings log

The Fresh-Context Problem

Without Governance

Every session invents its own patterns.

Monday’s code: snake_case
Tuesday’s code: camelCase
Monday’s errors: raw strings
Tuesday’s errors: structured objects

The agent is not inconsistent — it never saw Monday. Each session is a different employee with the same title but no shared memory.

With Governance

Every session inherits the team’s decisions.

Naming: from the standards document
Error contracts: from the standards document
Architecture boundaries: from the tech spec
Test expectations: from the tech spec

Consistency is a property of the document, not of the agent’s memory. The governance artifacts are the institutional memory the agent lacks.

What Goes Wrong Without It

Pattern Drift Violates A5

Without standards loaded, 3+ error handling patterns and 2+ naming conventions emerge within a month. Each reviewer catches different issues — no shared reference, only individual preference. The codebase degrades through a thousand plausible-looking commits.

Detection

Grep for error return shapes. String returns + object returns + thrown exceptions in the same service = standards not loaded.

Scope Creep Violates A2

The AI “helpfully” generates auth middleware, logging infrastructure, migrations nobody asked for. Looks good. Passes tests. Introduces unreviewed architectural decisions that compound — each one small enough to approve, collectively large enough to reshape the system.

Detection

Compare PR diff against the tech spec. Any file not named in the spec is a scope violation. Track violations per sprint.

The Volume–Risk Spectrum

Not all agent work carries the same risk. A 2026 Accenture/Wharton study argues that governance intensity should scale with where tasks fall on a volume–risk spectrum: high-volume, high-risk domains need more rigorous controls than low-volume, low-risk ones. Uniform governance wastes architect attention on config changes while under-governing payment integrations.

Datum applies this principle at story level. The spec session is where risk is assessed, and the governance response is calibrated accordingly.

Low Risk

Config changes, copy updates, internal tooling.

Spec: Brief only — no tech spec required
Review: Single reviewer, checklist optional
Builder level: Any stage can own end-to-end

Low-risk stories flow fast. The governance overhead is minimal because the blast radius is small.

Medium Risk

New features, API changes, schema migrations.

Spec: Full tech spec with acceptance criteria
Review: Spec-based review, checklist scored
Builder level: Co-Specifier+ for spec, any for build

The standard Datum flow. Most stories land here. Governance is proportional to the decision surface.

High Risk

Payment flows, auth changes, data model restructuring, cross-service contracts.

Spec: Full tech spec + ADR for architectural decisions
Review: Multi-lens review (Lead Architect + domain expert), checklist mandatory
Builder level: Specifier+ for spec, architect signs off before build begins
Additional: Spec-break rule enforced; any ambiguity halts the build

The Accenture/Wharton study found that the function with the largest revenue opportunity (Sales) was also the area with the highest risk-sensitive decisions. In Datum, the same pattern holds: highest-value stories need strongest governance. Architect time spent here, not on config changes, is where governance pays off.

Risk classification happens in the spec session, not after the build. The Lead Architect or spec author tags each story as low, medium, or high risk based on three questions: How many systems does this touch? What happens if it’s wrong? Can it be rolled back? The answers determine which governance track the story follows.

The Accountability Asymmetry

“Intelligence may be scalable, but accountability is not.” That line from the 2026 Accenture/Wharton study of AI agents across 18 industries is the core of why governance is structural.

The study found that 50%+ of U.S. working hours are subject to reshaping by AI agents. By 2027, half of business decisions will be augmented or automated, with 15% fully autonomous by 2028. Customer operations are trending toward 80% autonomous resolution by 2029. Agents are spreading “rapidly across the enterprise value chain, often ahead of formal strategy and governance.”

The report’s sharpest line: “Capability parity does not imply responsibility parity.” AI generates PhD-level reasoning but carries no moral weight, no institutional accountability, no long-term obligation. Those stay human. When intelligence distributes across humans and AI, responsibility does not.

What Scales

Intelligence, output volume, decision speed.

Code generation: near-infinite
Analysis and recommendation: near-infinite
Execution of defined tasks: near-infinite
Pattern recognition: near-infinite

Everything the agent does well, it does at scale. This is the promise.

What Doesn’t

Judgment, ownership, consequence.

Deciding what to build: human
Owning the outcome: human
Setting architectural intent: human
Accepting the risk: human

Everything the agent cannot do, humans do at the same pace they always have. This is the constraint.

The study’s case modeling revealed a pattern that Datum teams will recognize: the function with the largest revenue opportunity (Sales) was also the area with the highest volume of risk-sensitive decisions. Value and risk scale together. Governance must be designed before agents touch high-value systems. You do not get to learn from failure at this scale.

The Accenture/Wharton report proposes “humans in the lead, not in the loop.” Datum operationalizes the same principle: the Lead Architect does not review every line — but they set the standards document that governs every line. The spec author does not write the code — but they define the contracts the code must satisfy. Human authority is exercised through artifacts, not through direct supervision of every keystroke.

Anything else breaks. When intelligence scales and accountability does not, governance must be structural: documents, checklists, and review gates that run regardless of volume. Any model that depends on a human personally reviewing every output will collapse at the speed AI enables.

The Design Principle

Agent governance is not about controlling AI. It is about keeping humans accountable when direct oversight is no longer feasible. The standards document, the tech spec, the review checklist: not bureaucracy. The only way accountability keeps pace with intelligence.

Source: Accenture & Wharton School, The Age of Co-Intelligence, March 2026. Fortune coverage.

When It Breaks

Every methodology has characteristic failure modes. These are the five ways Datum collapses when teams adopt the rituals without the substance. Each violates a specific axiom. Trace the tag to find where it breaks.

Cargo Cult Specs A2 + A3

Spec sessions happen because the process requires them, but the documents are copy-pasted templates. “It works correctly” is not an acceptance criterion. The AI produces plausible code. Reviews pass because the spec is too vague to fail against.

Detection

Spec gap rate stays high but spec-break never fires. Builders guess instead of flagging.

Checklist Theater A5

The checklist is scored to pass, not to verify. “We have tests” scores Level 2. “Our tests cover the failure modes in the spec” is the actual standard. When production incidents hit, the relevant category was scored high.

Detection

Scores 18+ while incidents occur in categories marked Level 2+. Scores and reality have diverged.

Review Death Spiral A4

AI-assisted juniors produce PRs faster than seniors can review. WIP limit fires constantly. Seniors rubber-stamp to clear the queue. Turnaround metric is green. Review quality collapses. Defects reach production.

Detection

Persistent WIP breaches. Rework rises while turnaround stays green. Reviews ship defects.

Spec Silo A2

The Architect writes specs alone, hands them to builders. Builders treat specs as requirements, not shared design. Questioning the spec feels like insubordination. Spec gaps accumulate silently.

Detection

High rework + low spec gaps = builders absorbing ambiguity. Juniors never progress past Builder stage.

Standards Drift A3

Standards written in week 1, never updated. By week 8 the codebase has evolved — new ADRs, new patterns — but the standards document is frozen. New code follows patterns the AI invents. The codebase develops two dialects: “standards-era code” and “post-standards code.”

Detection

Checklist consistency scores decline. Review findings reference patterns not in the standards doc. The doc’s last-modified date is more than 2 weeks old.

A Week in Datum

The best way to understand Datum is to watch one story move through it. This is PAY-41: Add Apple Pay to Checkout. Five days, from brief to production.

Monday — Spec Session

Brief to spec in one session. Every question now saves a day of rework later.

The EM presents the brief: 23% mobile abandonment at the payment step. Apple Pay eliminates manual card input. Scope: payment step only. Existing Stripe integration, iOS Safari only, fallback to standard card entry.

PAY-41: Add Apple Pay to Checkout

Brief
  Mobile abandonment at payment step: 23%
  Root cause: manual card input on small screens
  Proposed fix: Apple Pay via existing Stripe integration

Scope
  - Payment step only (not cart, not confirmation)
  - iOS Safari only (non-iOS browsers fall back to card)
  - Uses existing Stripe payment intent flow

Acceptance Criteria
  [ ] User sees Apple Pay button on iOS Safari
  [ ] Tapping Apple Pay completes payment without card input
  [ ] Non-iOS browsers show standard card form (no error)
  [ ] Duplicate submissions return original result, not double charge

The Architect drafts the tech spec: a new POST /api/payments/apple-pay endpoint, a state machine (idle → validating → processing → completed | failed), Stripe webhook schema for async confirmation, and a 30-second timeout. A Builder asks about duplicate submissions. An idempotency key goes into the spec before anyone writes code.

Tech Spec — PAY-41

Endpoint
  POST /api/payments/apple-pay
  Headers: Idempotency-Key (required, UUID)
  Body: { cartId, applePayToken }

State Machine
  idle → validating → processing → completed
                                  → failed

External Calls
  Stripe PaymentIntent.create — timeout: 30s
  Stripe webhook: payment_intent.succeeded | payment_intent.payment_failed

Error Contract
  { code: "PAYMENT_DECLINED", message: "...", correlationId: "..." }
  { code: "APPLE_PAY_UNAVAILABLE", message: "...", correlationId: "..." }

Estimate: AI build ~2h × 2.5x = 5h total

Tuesday — Build

Standards doc loaded. Spec loaded. The AI builds against both.

The Builder loads the project standards document and the PAY-41 tech spec into the AI session. The AI generates the endpoint, state machine, Stripe integration, and tests. Self-verification against acceptance criteria: 3 of 4 pass. The fourth (non-iOS browser behavior) exposes a spec gap: the spec says “fallback to card” but does not specify whether the Apple Pay button is hidden or shown-but-disabled. The Builder logs gap #1 and continues.

Wednesday — Review + Spec-Break

Review against the spec, not against taste. The spec is the contract.

A senior reviews the PR against the tech spec. Two rejections: the Stripe call uses a 60-second timeout, not the 30 seconds the spec requires; and there is no structured error response for PAYMENT_DECLINED, just a generic 500. Both are spec violations, not style disagreements.

During the fix, the Builder flags two more gaps: the spec does not define retry behavior on Stripe timeouts, and webhook signature validation is unspecified. Three gaps total. The spec-break threshold. A 30-minute re-spec session patches the spec with retry policy (exponential backoff, 3 attempts, 90s total cap) and webhook HMAC verification. Build continues with the amended spec.

Thursday — Hardening

The defensive checklist is not a formality. It is where the real quality lives.

PAY-41 is scored against the defensive checklist. Input validation: present. Error handling: structured with correlation IDs. Idempotency: enforced via the idempotency key. Timeout: fixed to 30 seconds per spec. One gap remains: structured logging lacks a correlation ID on the webhook handler. Fixed in 10 minutes.

Defensive Checklist — PAY-41

Input Validation        ✓  Level 2  (schema + boundary checks)
Error Handling          ✓  Level 2  (structured, correlation IDs)
Idempotency             ✓  Level 3  (key-based, tested)
Timeout / Retry         ✓  Level 2  (30s timeout, 3 retries, backoff)
State Management        ✓  Level 2  (enum states, transition map)
Logging                 ✓  Level 2  (structured, correlation ID added)
Auth / Permissions      ✓  Level 2  (existing auth middleware)
Test Coverage           ✓  Level 2  (happy + sad paths covered)

Score: 21/24  (Level 2+ across all categories)

The Lead Architect notes that the Stripe webhook verification pattern is reusable. It goes into the standards document as the canonical webhook integration pattern.

Friday — Demo + Retro

Ship, reflect, improve the machine that builds the machine.

Demo to the product stakeholder: Apple Pay works on iOS Safari. Card fallback on Chrome. No double charges. The stakeholder sees exactly what the acceptance criteria described. No surprises. The spec was the contract.

Retro surfaces a cross-story pattern: timeout behavior was underspecified on 2 of 3 stories this week, not just PAY-41. The team adds a mandatory “External Call Timeouts” section to the spec template. The review findings log is updated. Next week’s specs will be better because this week’s gaps were captured.

The Ratio Is the Point

The story took 5 days. The code took 2 hours. The other 38 hours were specification, review, hardening, and learning. That ratio is the methodology working, not a sign of overhead. AI makes code cheap. Datum invests the surplus in the work that actually prevents production incidents: getting the spec right, verifying against it, and feeding every lesson back into the system that writes next week’s specs.

Getting Started

The smallest thing a team can do to experience Datum: run one spec-first cycle on one story. Write a brief (problem, acceptance criteria, constraints). Write a lightweight tech spec (contracts touched, boundary schemas). Build from the spec with AI. Review against the spec. One story, start to finish. If the team cannot see the value after one story — if the spec did not prevent at least one mistake the AI would have otherwise made — the model needs adjustment before scaling.

Week	Focus	Success Signal
Week 1	Learn the model. Read the methodology. EM/PO and Lead Arch write 2–3 briefs as a live exercise. Run spec sessions on those briefs. Builders build from the specs.	Every pod member can explain the spec-to-merge cycle without looking at the document.
Week 2	First real sprint at 50–60% capacity. Full Monday-to-Friday cycle with real work. Accept lower throughput.	At least one story ships through the full cycle: spec → build → review → verify → demo.
Weeks 3–4	Full capacity. Run the complete model. Track all metrics.	Spec gap rate is decreasing. Review turnaround is under 4 hours. No stories enter build without a spec.
Week 5	4-week health check. Five diagnostic questions.	3+ of 5 health check questions answered positively. If not, diagnose before continuing.

What to Expect Emotionally

Week 1

Feels slow and bureaucratic. Writing specs before building feels like overhead when the team is used to jumping straight to code. This is normal. The discomfort is not a signal that the process is wrong — it is the feeling of shifting the bottleneck.

Week 2

Feels like reduced output. The team ships fewer features than under the old process. This is the investment period. The team is building the muscle memory for specification, not yet seeing the return.

Week 3

Rework reduction becomes visible. Stories that went through spec sessions have fewer review findings and fewer integration surprises. The Monday investment starts paying off.

Week 4

The team stops wanting to go back. The evidence is in the review findings log: fewer recurring issues, faster review turnaround, less Thursday firefighting.

Do Not

Do not skip spec sessions because they feel slow. The spec session is the load-bearing rule.

Do not let seniors bypass review because they are trusted. Axiom 5 applies to everyone.

Do not reduce the Monday alignment to a status update. It is a design session.

Do not adapt the model before completing one full 4-week cycle. Follow it exactly first, then adjust from evidence.

Migration

Getting Started covers your first cycle. This chapter covers what happens after — when you need to transition a team, convince a manager, or migrate from an existing methodology.

Every team arrives from somewhere. The three most common starting points each have different friction profiles. What maps cleanly, what changes, and where the resistance concentrates.

S

From Scrum

Ceremonies map. Estimation does not.

Sprint planning becomes Monday spec sessions. Sprint review becomes Friday demo. Retrospective becomes Friday retro. The cadence is familiar. What disappears: story points (replaced by the 2.5x multiplier), the Scrum Master role (the Lead Architect absorbs process ownership), daily standup (replaced by async status on the board), and the locked sprint backlog (carryover is explicit, not a failure).

The hard part: Teams expect estimation ceremonies. The 2.5x multiplier feels like guessing to people trained on planning poker. Show them: track estimation accuracy across four weeks. The multiplier calibrates faster than story points ever did because it is based on a single observable ratio, not consensus fiction.

What maps cleanly

Sprint planning → Monday spec sessions. Sprint review → Friday demo. Retrospective → Friday retro. Weekly cadence → weekly cadence.

What changes

No story points. No Scrum Master. No daily standup. Sprint backlog not locked — carryover is explicit, not punished.

K

From Kanban

Flow preserved. Spec gate added.

The board stays. WIP limits stay (and tighten). Continuous flow is preserved. What changes: a spec gate appears before "Ready." Nothing enters the build column without a brief and tech spec. Pull-based work selection continues, but each item has a verification contract before anyone touches it.

The hard part: Minimal-process teams see specs as bureaucracy — overhead imposed by people who do not write code. Counter: the spec replaces the ad-hoc Slack thread where requirements emerge mid-build. It is not new work. It is the same communication made explicit, written once instead of scattered across twelve messages and a call.

What maps cleanly

Board → Datum Kanban style. WIP limits → same, tighter. Continuous flow → preserved.

What changes

Spec gate before “Ready.” Nothing enters build without a brief and tech spec.

!

From Chaos

Nothing maps. Start with one story.

No existing ceremonies map. No existing artifacts carry over. Everything changes. The only viable approach: start with one story, one cycle, exactly as described in Getting Started. Do not attempt a full-team rollout. Do not announce a process change. Run one story through spec → build → verify → retro. Then run another.

The hard part: Every previous process was imposed from above and abandoned within months. The team is inoculated against methodology. Datum must be demonstrated, not mandated. Run one story. Let the evidence argue. If the findings log shows fewer defects and faster delivery on that one story, the team will ask questions. That is the opening.

What maps cleanly

Nothing.

What changes

Everything. Start with one story, one cycle. Expand only after evidence accumulates.

Common Objections

“We don’t have time for specs.”

You do not have time not to. The 2.5x multiplier accounts for spec time. Teams that skip specs do not go faster — they move rework from the front of the cycle to the back, where it costs more. The spec is not overhead on top of building. It is the activity that prevents rebuilding.

“Seniors will resist review.”

Review in Datum is verification against a spec, not judgment of competence. The reviewer checks whether the output matches the contract. It is closer to QC inspection than code critique. When the spec is clear, review is fast and impersonal. Resistance comes from ambiguous specs that turn review into a design debate.

“AI makes specs obsolete.”

Vague input produces vague output. The spec is what makes AI output verifiable. Without a spec, you cannot distinguish correct code from plausible code. AI does not remove the need for specification — it raises the cost of skipping it, because the volume of unverified output increases.

“This only works for backend.”

If your system has contracts (APIs, interfaces, data schemas) and state (persistence, user sessions, workflows), Datum applies. Frontend, backend, mobile, infrastructure, data pipelines. The artifacts differ. The cycle does not.

Key Insight

Every methodology transition fails when imposed. Every one succeeds when a small team demonstrates value and others ask to join. Start with one pod. Run one cycle. Let the findings log speak.

Glossary

Accountability Asymmetry: Intelligence scales; accountability does not. AI can generate output at infinite volume, but humans still own every outcome. Governance must be structural (documents, checklists, gates) because direct supervision cannot keep pace. See Ch. 13.
2.5x Multiplier: AI code estimate × 2.5 = total effort including hardening, testing, and verification. The 60% that isn’t code generation takes the same time regardless of how the code was produced.
Builder: Stage 1 of Datum’s growth path. Builds from specs with AI, self-verifies, flags gaps instead of guessing. See Ch. 10.
Brief: Business intent document written by EM/PO. Contains: problem statement, scope, constraints, testable acceptance criteria. The upstream input to the spec session.
Capacity Trap: AI acceleration frees hours, but freed hours do not become growth automatically. Without deliberate redeployment into specification, verification, and governance, the capacity evaporates. See Ch. 3.
Carryover: Stories not completed in the current week, explicitly moved to next. Distinguished from hidden scope creep by being visible at Monday alignment.
Datum: The fixed reference point. In CNC machining, every measurement is relative to the datum. In this methodology, the spec is the datum — the reference for building and reviewing.
Defensive Checklist: 8-category quality scorecard (0–3 per category, 24 max). Categories: input validation, error handling, idempotency, timeouts, logging, state management, security, testing.
Co-Specifier: Stage 2 of Datum’s growth path. Drafts specs under senior review. The gap between draft and revision shrinks over time. Gate: senior revisions are cosmetic, not structural. See Ch. 10.
Delivery Style: The cadence pattern: Sprint (weekly batch), Kanban (pick-and-go), or Continuous (per-story flow). The quality model is invariant across styles.
EM/PO: Engineering Manager / Product Owner. Writes briefs with business intent, scope, constraints, and acceptance criteria. In small pods, may be part-time or combined with other responsibilities.
Fresh-Context Problem: Every AI session starts from zero. Without a standards document loaded as context, each session invents its own patterns. The standards document solves this.
Governor: Stage 4 of Datum’s growth path. Multi-lens review, standards contributions, mentors builders. The shift from individual contributor to force multiplier. See Ch. 10.
Hardening: Thursday quality verification. Code is checked against the defensive checklist, acceptance criteria are verified, integration testing runs. Required before Friday demo and release.
Lead Architect: The single point of architectural accountability in a pod. Owns the standards document, leads spec sessions for high-risk stories, reviews complex work. Two functions: governance (setting standards) and coaching (growing juniors via the spec revision loop). See Ch. 7.
Multi-lens Review: Review approach checking multiple dimensions: correctness, security, performance, maintainability, ops readiness. Not just “does it work.”
Pod: A self-contained team of 2–7 people running Datum. Scales by replication (more pods), not growth (bigger pods).
Shadow AI: AI agents deployed without formal governance. 75% of knowledge workers use AI tools, often unsanctioned. In Datum terms: builders generating code without a standards document loaded. See Ch. 3.
Prompt Library: Shared, curated collection of effective AI agent prompts. Captures what works for the pod. Reviewed and updated at Friday retro.
Re-spec Session: 30-minute session triggered by the spec-break rule. Patches the spec before continuing the build from a broken reference.
Review Findings Log: Running record of all review findings, spec gaps, and spec-break triggers. Source data for Friday retro. Turns the methodology into a learning system.
Spec Gap: An ambiguity, missing detail, or unspecified behavior in a technical specification discovered during implementation. Three or more spec gaps in a single story trigger the spec-break rule.
Spec Session: Collaborative design meeting where a brief becomes a technical specification with contracts, state machines, boundary schemas, and acceptance criteria. Held Mondays (sprint) or on-demand (kanban/continuous). See Ch. 5.
Spec-break Rule: If a builder discovers 3+ spec gaps in a single story, building stops. A re-spec session is held. The cost of re-speccing is always lower than building from a broken spec.
Specifier: Stage 3 of Datum’s growth path. Writes specs independently, leads spec sessions, reviews juniors’ code. Gate: low rework rate on own specs. See Ch. 10.
Standards Document: Single source of architectural truth loaded into every AI session. The primary control mechanism for AI output quality.
Spec Revision Loop: Datum’s coaching mechanism. A Co-Specifier drafts a spec, the Lead Architect revises it. The delta between draft and final spec is the teaching artifact. See Ch. 10.
Technical Spec: Contract-level specification: API contracts, state machines, boundary schemas, test expectations. The artifact the AI builds against and the reviewer verifies against.
Volume-Risk Spectrum: Governance intensity scales with risk. Low-risk stories (config changes) need minimal oversight. High-risk stories (payments, auth) need full tech spec, ADR, multi-lens review, and spec-break enforcement. See Ch. 13.
WIP Limit: Maximum concurrent PRs per pod. Typically 2 for a 7-person pod, 1 for a 2-person pod. When hit: review before pulling new work.

References

Crowley, J., Close, K., Munie, K. and Karaca-Griffin, S. The Age of Co-Intelligence: How Humans, AI Agents and Robots Are Redefining Value. Accenture Global Products Practice & Wharton AI and Analytics Initiative, March 2026.
Boehm, B. Software Engineering Economics. Prentice Hall, 1981. See also Boehm, B. and Basili, V. “Software Defect Reduction Top 10 List.” IEEE Computer 34(1), January 2001.
Jones, C. Applied Software Measurement. McGraw-Hill, 3rd edition, 2008. Approximately 40% of all defects traced to requirements errors across 12,000+ projects.
NASA JPL. Software defect cost studies, 1990s–2000s. 70–85% of rework costs traced to requirements errors. Validated Boehm’s defect cost escalation curve.
IBM Systems Sciences Institute. Relative cost of fixing defects by phase. Cited in Boehm (1981) and widely replicated.
Faros AI. Developer telemetry study: 10,000+ developers, 1,255 teams. 2025. +98% PRs merged, +21% tasks completed in high-AI-adoption teams.
Stern, L. (Agoda). Developer productivity data: +91% increase in PR review time under AI-assisted workflows. Summarized in Stiller, E. InfoQ, March 2026.
Stiller, E. “AI Coding Assistants Haven’t Sped up Delivery Because Coding Was Never the Bottleneck.” InfoQ, March 2026.
Griffin, L. and Carroll, R. “Spec-Driven Development.” InfoQ, 2025.
Fuller, J. “Create an Onboarding Plan for AI Agents.” Harvard Business Review, March 2026.
Brooks, F. The Mythical Man-Month. 1975 (Anniversary Edition 1995). Chapter 4: Conceptual Integrity.
Brooks, F. “No Silver Bullet: Essence and Accident in Software Engineering.” 1986.
Singer, R. Shape Up: Stop Running in Circles and Ship Work that Matters. Basecamp, 2019.
Anderson, D. Kanban: Successful Evolutionary Change for Your Technology Business. Blue Hole Press, 2010.
Cagan, M. Inspired: How to Create Tech Products Customers Love. Wiley, 2018.