How We Use AI to Build AI for Accounts Payable (Part 1)

Last month, I merged a pull request that touched three repositories, added 44 unit tests, and implemented a new filter condition engine for invoice processing. It went through multiple rounds of automated plan review and code review before reaching my desk. I never wrote a single line of the code myself.

An AI agent did. I reviewed the output, verified the logic, and hit merge.

This is not a story about replacing developers. This is a story about what happens when you spend seven years building an accounts payable automation platform — dozens of microservices, 30+ ERP integrations, 150+ enterprise customers — and then point AI agents at the development work itself.

The problem with scaling a microservices platform

At Snowfox, we predict how purchase invoices should be coded in our customers’ accounting systems. GL accounts, cost centers, project codes, VAT handling — the kind of work that finance teams have done manually for decades. Our AI learns from each customer’s historical data and predicts the correct coding with 85–95% accuracy.

The platform behind this is not a monolith. It is a collection of cloud-native microservices, each handling a specific responsibility: parsing invoices from different formats, running AI predictions, validating results, writing predictions back to ERPs, monitoring accuracy, generating billing. These services communicate through event-driven messaging, and they share data across multiple stores — operational databases, analytics warehouses, configuration systems. When you have 150+ customers, each with their own trained models, their own ERP integration, their own configuration — the surface area is enormous.

Every ticket requires the developer to understand which services are involved, how they communicate, what data they read and write, what the customer configuration looks like, and what the downstream effects of any change will be. We enforce formal access controls on customer configuration — each service is only allowed to read the specific configuration sections it needs. It is a lot to hold in your head.

This is exactly the kind of work where AI agents excel — not because the work is simple, but because it is well-structured and deeply documented.

What agent-based development actually looks like

When I say “AI agent,” I do not mean a chatbot you ask questions to. I mean an autonomous process that takes a development task as input and produces reviewed code with tests and pull requests as output. The agent reads the ticket, explores the codebase, writes an implementation plan, gets the plan reviewed, writes the code, gets the code reviewed, and opens PRs — all without human intervention during the run.

The part that took the most iteration to get right was quality control. Early on, the agent would confidently produce plans that had subtle misunderstandings — wrong function signatures, incorrect assumptions about configuration structure, missing edge cases. Letting it go straight from plan to code produced technically plausible but sometimes wrong results.

The breakthrough was building automated review into the process itself. Before the agent writes any code, the plan gets checked against the actual codebase. Before code ships, it gets checked against the approved plan. These review stages filter out the majority of issues before a human ever looks at the output.

What made the reviews actually useful, rather than just noisy, was setting a strict evidence standard: every review finding must cite the exact text it disagrees with and the exact code that contradicts it. Vague concerns and style nitpicks get automatically discarded. This turns out to be a lesson that applies to human code review as well — specificity beats thoroughness.

The human still makes the final call. I review every plan and every PR before it merges. But what reaches me is already substantially cleaner than a first draft from a junior developer, because it has been through multiple automated review cycles before I see it.

Why AP is a particularly good domain for this

Not every software domain is equally suited to agent-based development. AP automation turns out to be an unusually good fit, for several reasons.

Highly structured data flows. An invoice enters the system, gets parsed, gets predicted, gets validated, gets written back to an ERP. Each step has clear inputs, outputs, and failure modes. An AI agent can trace the complete lifecycle of an invoice through the codebase and understand exactly what each service does.

Customer-specific configuration. Every customer has a configuration profile with dozens of parameters controlling how their invoices are processed. We maintain formal access control rules mapping which services can read which configuration sections. An agent can check these rules before writing code — something human developers occasionally forget to do.

Per-customer AI models. Each customer has their own trained models because accounting conventions vary by company, industry, and country. This means our codebase handles a lot of configuration-driven behavior. Changes tend to be well-scoped: add a new config option, handle it in the right service, test it with the right mock data.

Mature test infrastructure. Seven years of development means we have comprehensive test suites and well-documented deployment procedures. An agent can run the existing tests to verify it hasn’t broken anything.

Extensive documentation. This might be the most important factor. We built an internal documentation repository that contains architecture descriptions covering our entire service catalog, database schemas, coding conventions, and deployment patterns. When an AI agent needs to understand how a particular subsystem works or what a specific data model looks like, that information is accessible and structured.

This documentation was originally written for human developers during onboarding. It turns out the same documentation makes AI agents dramatically more effective. The investment in documentation pays off twice.

This is Part 1 of a two-part series. In Part 2 (Publishing 18th March 2026), we will cover what happens when agents get it wrong, the lessons we have learned, and where this is heading.

About the author

Markus Paaso

Markus Paaso is the CTO of Snowfox, where he leads product development and drives technical excellence across the platform. With a strategic vision and a passion for innovation, Markus is helping shape the future of finance automation through cutting-edge technology.