Intermediate9 min

Tests as Guardrails for Agents

Tests are not just quality insurance, they are the agent's eyes. A test gives the agent an objective, repeatable signal of whether its change worked, which lets it self-correct without you in the loop. On a real codebase, a good test suite is what makes autonomous work trustworthy.

Step 1: Make the agent write the test first

Ask for the test before the implementation. This forces the agent to define what 'done' means, and it cannot cheat by writing code that merely looks right.

discount.test.ts

import { applyDiscount } from "./discount";

test("applies a percentage discount", () => {
  expect(applyDiscount(100, 0.2)).toBe(80);
});

test("never returns a negative price", () => {
  expect(applyDiscount(10, 2)).toBe(0);
});

Step 2: Let the agent run the loop itself

Tell the agent the test command and let it iterate until green. It will run, read failures, patch, and re-run on its own. Your job becomes reviewing the final diff, not babysitting each attempt.

claude - shop

$> implement applyDiscount so the tests pass. run npm test until green.

FAIL never returns a negative price

Patched: clamp result at 0. Re-running ...

PASS 2/2. Implementation complete.

Step 3: Wire the checks into your memory file

Add the test, type-check, and lint commands to CLAUDE.md and require them to pass before a task is done. Now every task ends with the agent proving its own work.

Green is the finish line

When the definition of done is 'the suite is green and types pass', the agent has a clear target and you have a verifiable result. Vague tasks produce vague work; testable tasks produce shippable work.

Terminal - test run

$ npm test

PASS applies a percentage discount

PASS never returns a negative price

Tests: 2 passed, 0 failed

The agent iterates against the suite until everything is green.

Step 1: Make the agent write the test first

Step 2: Let the agent run the loop itself

Step 3: Wire the checks into your memory file

Hands-on tasks