← All writing LLM Systems

The Agentic Engineer [Part 3/3]: Into the Brownfield

Parts 1 and 2 assumed greenfield, where I set the conventions. Most real work isn't that. This is what changes when you point an AI agent at a large codebase you didn't write, where the rules already exist and the blast radius is unknown.

Read here Read on Medium ↗
TL;DR
  • The Agentic Engineer · Part 3 of 3. A series on running an AI coding agent as a production engineering discipline. Part 1: Principles & the daily operating model · Part 2: The full pipeline · 3. The brownfield.
  • The principles don't change in a codebase you didn't write; the early moves do. In greenfield you set conventions and lock contracts. In the brownfield you have to discover them first, because almost nothing important stays greenfield for long.
  • One shift reshapes everything: understand before you touch, applied to a system too large to read. You're not the founding engineer, you're the new senior hire, and you earn the right to change the system by first proving you understand it.
  • Concretely: keep the map in investigation subagents, reverse-engineer the project config instead of authoring it, capture discovered conventions as skills, open the pipeline with a mapping phase, and characterize behavior with tests before you change it.

Everything in the first two articles assumed I was the architect. Blank repo, clean main, conventions I chose, contracts I locked. That's the easy case, and it's not the case most of us live in. The work that pays the bills happens inside large systems with years of history, decisions no one remembers making, and a blast radius you can't fully see from any single file.

The principles from Part 1 don't change in the brownfield. The early moves do. In greenfield I set conventions; in the brownfield I inherit them. In greenfield I lock contracts; in the brownfield I have to discover the contracts that already exist, often implicit and undocumented. The mental model that keeps me honest: in a new project I'm the founding engineer, but in an existing one I'm the new senior hire. A good senior hire doesn't walk in and start moving things. They earn the right to change the system by first proving they understand it.

That single shift (understand before you touch, applied to a system too large to read) reshapes the whole practice.

Greenfield versus brownfield: the principles hold, but the early moves invert GREENFIELD · you are the architect Set the conventions. Author the config. Lock the contracts you create. Pull knowledge into context. BROWNFIELD · you are the new senior hire Discover the conventions. Reverse-engineer the config. Find the contracts already crossing boundaries. Keep the map in subagents; pull only the destination into the main session. Same principles. Inverted opening: recover the structure before you change it.

The discipline is identical in both lanes. What flips is the source of truth: in greenfield it's your intent, in the brownfield it's the code that's already there.

Context scarcity gets worse, so mapping moves to subagents

Part 1's first principle was that context is the only scarce resource. A large existing codebase makes that constraint bite immediately. You cannot read it into the window; there's far too much, and most of it is irrelevant to any given task. If you try, you fill the context with noise and the agent's quality collapses before it writes a line.

So in the brownfield, investigation subagents stop being a nice-to-have and become the core move. "Use subagents to investigate how authentication flows through this service." They read the hundred files; I get a three-paragraph summary and the five file paths that actually matter. This is precisely the read-heavy work whose output compresses to a short report, which is where subagents earn their keep. The orchestrating session stays lean and oriented while the expensive reading happens in isolated contexts I then discard.

The greenfield instinct is to pull knowledge into context. The brownfield instinct is the opposite: keep the map in subagents, and pull only the destination into the main session.

The project config is reverse-engineered, not authored

In greenfield, the project config file is a short list of conventions I decided. In the brownfield, I can't decide them; they're already encoded in the code, and my config has to match them or the agent will fight the codebase on every turn.

So I build it by extraction, not invention. Run the existing test, build, lint, and run commands and record the exact invocations that actually work (still the highest-ROI section). Have an investigation subagent surface the non-obvious gotchas: the module that must be imported first, the env var with no default, the one directory where the usual pattern doesn't apply. The test for each line is the same as ever (would removing it cause a mistake?), but the source of truth flips from my intent to the codebase's reality.

A specific trap: it's tempting to write down the conventions you wish the codebase had. Don't. The config describes the system as it is, not as it should be. Aspirational rules just teach the agent to generate code that doesn't match its neighbors, and inconsistency in a large codebase is its own category of defect.

Skills capture discovered conventions

Greenfield skills encode rules I designed. Brownfield skills encode patterns I found. When an investigation reveals that this codebase has a specific way of handling errors, a particular data-access layer, a house style for a recurring shape, that discovered convention becomes a skill, so the agent reproduces the existing pattern instead of inventing a fresh one that's subtly different.

This matters more than it does in greenfield, because in a new project any consistent choice is fine, but in an existing one there's already a right answer and it's whatever the surrounding code does. The job of a brownfield skill is to make the agent a faithful local, not a clever outsider.

A good senior hire doesn't walk in and start moving things. They earn the right to change the system by first proving they understand it.

The pipeline still runs, but it opens with a mapping phase

The Large-task pipeline from Part 2 still applies, with one addition bolted to the front and a sharpened attitude toward contracts.

Before plan mode, I run an explicit mapping phase: investigation subagents trace the relevant code paths, identify every module the change will touch, and surface the implicit contracts already crossing those boundaries. In greenfield, Step 6 was "lock the contracts I'm about to create." In the brownfield it becomes "discover the contracts that already exist, decide which I must honor untouched, and only then lock any new ones." Breaking an existing contract you didn't know was load-bearing is the signature brownfield disaster, and mapping is how you avoid signing up to break it.

Plan mode then gets pointed at the real, discovered code paths rather than my assumptions about them. The spec names actual existing interfaces and, critically, states which existing behavior is explicitly out of scope to change. The rest of the pipeline (spec review, decomposition, the gated and arbitrated build/verify loop, the evidence-backed PR) runs as written. The gates matter more here, not less, because the cost of an undetected regression in a system thousands of people depend on dwarfs the cost in a project that doesn't exist yet.

Tests are sparse, so characterize before you change

Part 2 leaned hard on test-driven development: write the failing test, then the code. The brownfield complication is that the code you're about to change often has no tests at all, or has tests that encode behavior nobody has validated in years. You can't write a failing test for a feature when you don't yet know what the current behavior even is.

So the brownfield move is to characterize before you change: write tests that capture what the existing code actually does right now, even the parts that look wrong. Those characterization tests become a safety net. Now any change that alters behavior will trip a test, and you get to decide deliberately whether that change is intended or an accident. Only once the behavior is pinned do I move to the normal write-the-failing-test rhythm for the new behavior. You cannot safely change what you have not first made observable.

◆ Background · characterization tests

Pinning behavior before you touch it

The technique has a name and a lineage. Michael Feathers, in Working Effectively with Legacy Code, defines a characterization test as one that documents the actual, current behavior of a piece of code rather than its intended behavior. Unlike a specification test (which asserts what the code should do), a characterization test captures what it does do, including quirks, because that is what callers already depend on (summary of the book's key points). It has become the standard first move for safely modifying untested legacy code: take a snapshot of present behavior, lock it with tests, and only then change anything, so any behavioral drift trips an alarm instead of shipping silently.

Debugging assumes the code is guilty, and history is evidence

Part 1's debugging discipline (root cause before fix, never touch production code until the cause is proven) holds completely. The brownfield sharpens two things.

First, you cannot assume the existing code is correct. In greenfield, a bug is usually in the code you just wrote. In the brownfield, the bug may live in code that's been "working" for years and only now meets an input that exposes it. The systematic debugger has to be willing to indict old code, not just new.

Second, history is first-class evidence. git blame and the commit log aren't a courtesy here; they're often the fastest route to root cause, because the question "what changed, and when did this start" frequently answers itself in the history. The first place the symptom appears is rarely where the defect lives, and in a large codebase the distance between the two can be enormous. Evidence over intuition matters more the larger the system gets, because intuition's hit rate falls as the surface area grows.

◆ Background · history as evidence

git bisect and git blame as root-cause tools

When a symptom appears far from its cause, the commit history is often the shortest path between them. git bisect runs a binary search over history: you mark one known-good and one known-bad commit, and it halves the range each step until it pinpoints the commit that introduced the regression, finding it in about log₂(n) tests, so roughly 15 checks across 20,000 commits (Gun.io, git bisect guide). git blame then names the author and commit for each line so you can read the change in context. The combined heuristic: bisect to find which commit changed behavior, blame to see who and how, which is exactly the "what changed and when did this start" question that legacy bugs so often answer themselves.

Defense in depth, and leaving the map better than you found it

Part 1's debugging protocol ends with defense in depth: guard the layers so the bug class can't recur, add regression tests at the sibling sites, append a one-line note for the next session. In the brownfield this step compounds. Every investigation you run is reusable knowledge: the proven root cause, the map of how a subsystem actually works, the implicit contract you finally pinned down. Promote it. A finding that proves universally true about this codebase graduates from a one-off session note into a permanent skill or a config rule, exactly as Part 1's reflection loop prescribed.

Done consistently, this is how an agent's effectiveness in a legacy system compounds instead of resetting every session. Each task leaves the map a little more complete, the conventions a little more captured, the next change a little cheaper. In greenfield, the structure you build is the system. In the brownfield, the structure you build is your understanding of a system someone else built, and that understanding is the asset.

In the brownfield, the structure you build is your understanding of a system someone else built, and that understanding is the asset.

The throughline, one last time

Across all three parts, the argument never changed: the agent is capable, but capability without structure doesn't survive contact with production. Greenfield lets you build the structure from scratch. The brownfield forces you to recover it first, from a system that won't explain itself, before you're allowed to change anything safely.

That's the harder discipline, and it's the one that actually matters at scale, because almost nothing important is greenfield for long. The structure is still the job. In the brownfield, the first half of that job is figuring out what structure is already there.

The series: Part 1: Claude Code Is Not a Chatbot. The principles and the daily operating model. Part 2: Shipping a Feature Without Losing the Thread. The full pipeline for large work. Part 3: Into the Brownfield. (You are here.)