This is Part 2 of a two-part series. In Part 1, we covered what agent-based development looks like in practice and why accounts payable automation is a particularly good domain for it.
When agents get it wrong
Not every agent run succeeds. Sometimes the agent misunderstands the intent of a ticket and implements something different from what was requested. We had a case where a ticket asked for an AI-based approach to a prediction problem. The agent analyzed the codebase, found a simpler deterministic approach that addressed part of the problem, and implemented that instead — technically sound, well-tested, reviewed cleanly, but not what the ticket was asking for.
The fix was not to make the agent smarter. It was to add a feedback loop. When a run fails, the developer describes what went wrong in plain text, and that description gets fed into the next attempt. The agent does not repeat the same mistake.
We also found that the most effective way to prevent these misunderstandings is to front-load human judgment. For complex or ambiguous tickets, we now have the agent ask clarifying questions before it starts working. This brief design conversation — five minutes of the developer’s time — eliminates hours of wasted agent work.
What I have learned
After several months of using agent-based development as part of our daily workflow, a few things stand out:
Documentation is the multiplier. The quality of agent output is directly proportional to the quality of your codebase documentation. Schema definitions, architecture descriptions, coding conventions, access control rules — everything that helps a new developer understand the codebase helps an agent understand it too. We have invested heavily in structured documentation, and it has been the single biggest factor in agent effectiveness.
The human role shifts to architecture and review. I spend less time writing code and more time reviewing agent-generated plans and PRs. The quality bar does not change — I still reject PRs that do not meet our standards. But the throughput is different. Where I previously might handle one or two tickets per day alongside other CTO responsibilities, we can now process several tickets in parallel.
Nothing reaches production without human approval. The agents produce code changes as pull requests — that is their only output that matters. Every pull request is reviewed by an engineer before it can be merged. Deployment is always a separate, manual step performed by a person. There is no path from an agent’s output to a production system that does not pass through human review first. This is the same change control process we use for all development work. The agents do not get a shortcut.
Specificity beats thoroughness in automated review. A review finding that says “this function signature doesn’t match the plan” with exact quotes is worth ten findings that say “consider adding error handling.” We spent time tuning our review process to reward evidence over volume, and the improvement in signal-to-noise ratio was dramatic.
The meta-loop: AI building AI systems
There is something philosophically interesting about using AI agents to develop an AI-powered accounts payable platform. The agents write code that trains AI models, that generates predictions, that gets validated by other code, that feeds back into training. The development tool and the product are converging.
But I want to be careful not to oversell this. The AI agents we use for development (large language models) are fundamentally different from the AI that powers our invoice predictions (supervised learning on customer-specific accounting data). They solve different problems with different approaches. The development agents do not make our invoice predictions better. They make our ability to improve the platform faster.
And that matters. In a domain like AP automation, where every customer has unique requirements, where ERP integrations are complex and varied, and where accuracy directly impacts our customers’ financial operations — the ability to ship improvements faster, with consistent quality, is a genuine competitive advantage.
Where this is heading
We are still early. The agent handles well-defined tickets with clear scope much better than ambiguous feature requests. It works best when the codebase is well-documented and the coding patterns are consistent. It needs human review at every stage.
But the trajectory is clear. The fraction of development work that benefits from agent-based approaches is growing — not because the agents are getting smarter (though they are), but because we keep investing in the structured documentation and coding conventions that make agents effective. Every schema we document, every access control rule we formalize, every coding pattern we standardize makes the next agent run more reliable.
For teams building complex, multi-service platforms — whether in AP automation or any other domain with well-structured data flows and extensive configuration — the practical question is not whether to adopt agent-based development, but how to prepare your codebase for it. Start with documentation. Formalize your conventions. Structure your configuration. The rest follows.