Coding Agents in Production

Coding agents ship code in production codebases now. Cursor, Claude Code, Devin, Augment, OpenHands — the field is crowded and the capability is real. The question for engineering leaders isn’t “do we use them” — most of the team already does — but “how do we keep them from quietly accumulating debt the senior engineers will eat in a year.”

The discipline we apply.

Code an agent writes is still code#

Every line the agent commits should pass the same gates as every line a human commits. PR review, tests, lint, type checks, security scans. The agent’s output is faster, not better. Treat it as junior engineer output: useful, requires oversight.

The teams in the worst shape skipped review because “the agent already reviewed it.” The agent did not. The agent generated plausible-looking code that a tired reviewer would have flagged.

The review surface#

Reviewing AI-generated code is different from reviewing human code. The failure modes:

Plausible but wrong. Code that looks correct, runs in the happy path, fails on edge cases the model didn’t simulate.

Hallucinated APIs. Calls to libraries or methods that don’t exist or have different signatures than the model remembered.

Style drift. The agent’s idea of “clean code” diverges from the codebase’s conventions over time.

Test theater. Tests that exercise the code without exercising the behavior. 100% line coverage, 20% logic coverage.

Dependency creep. The agent reaches for a new library because it’s easier than learning the existing one. Three months later you have five JSON parsers.

Our review checklist (in addition to whatever your team already does):

Does this introduce a new dependency? Justify or remove.
Do the tests fail in the absence of the change? Run them on the parent commit.
Does the change touch error handling? Inspect the failure paths explicitly.
Does the change cross a security boundary (auth, encryption, secret handling)? Mandatory senior review.

Ownership#

The most under-engineered part of coding-agent workflows is ownership. Who is on the hook when the agent’s code breaks production?

Our rule: the developer who approved the PR is the owner. Not the agent, not “the AI team,” not “we’ll figure it out.” If you approved it, you own it. This pushes the right amount of skepticism into the review.

The corollary: don’t let agents merge their own PRs. Don’t let one agent review another agent’s PR. Humans approve.

Audit trail#

For any agent-generated code:

The prompt that produced the change, logged
The full diff, logged
The reviewer, logged
The time-to-incident if the change later breaks something, tracked

This isn’t surveillance theater. It’s how you learn whether the agent’s output is improving over time and which prompt patterns lead to which defect rates.

Where coding agents earn their seat#

Boilerplate and scaffolding. New endpoint, new test file, new migration. The pattern is well-understood; the agent saves typing.

Refactoring within a clear pattern. “Rename X to Y across the codebase.” “Replace this deprecated API with that one.” Mechanical changes that humans hate.

Test generation against existing code. Coverage gaps, edge-case tests. Humans review.

Doc generation. Stale-doc updates, README scaffolds, API doc generation from code.

Where they don’t (yet)#

Architecture decisions. The agent will produce a plausible architecture. It will not be the right architecture for your codebase. Senior engineer’s job.

Cross-system debugging. Production incidents that span services, observability tools, and human context. The agent can summarize logs; it can’t replace the diagnostic intuition.

Anything in security-critical paths. Auth, crypto, identity, payment flows. The cost of subtle defects is too high.

What we ship by default#

For engineering teams adopting coding agents via our DevOps practice:

Agent-generated PRs follow the same gates as human PRs
Review checklist tailored for AI failure modes
Ownership convention enforced
Audit log of prompts → diffs → reviewers → outcomes
Quarterly review of defect rate by code-origin (human vs agent vs mixed)

Coding agents are a productivity multiplier on a team that already ships well. They’re a debt multiplier on a team that doesn’t.

The agent doesn’t own the code. The reviewer does. Our team helps engineering orgs adopt coding agents without accumulating silent debt. Tell us about the team.

Code an agent writes is still code#

The review surface#

Ownership#

Audit trail#

Where coding agents earn their seat#

Where they don’t (yet)#

What we ship by default#

Related posts.

Docker in Production: Patterns That Stop Costing You Money

Building Production AI Agents: The Architecture Patterns That Actually Ship

SRE for AI Systems