[ infer ] inference.
field notes / engineering

The Forty-Dollar Invoice

What AP automation actually costs you before the agents arrive, and why the ROI story is usually three layers deeper than the deck suggests.

Most CFOs count the labour line on their AP stack. That is the smallest number. The real cost is rework, late-payment penalties, missed early-payment discounts, duplicate payments, invoice fraud, and Month 13 cleanup. Agentic AP does not just reduce labour. It collapses the whole stack. But only if deployed with a real threat model.

Most CFOs can quote their cost per invoice to two decimal places. Almost none of them can quote what that invoice actually costs their business.

The number in the deck is usually the labour-line cost. Sometimes with a bit of software and infrastructure rolled in. Ardent Partners puts the industry average at USD 9.40 per invoice across all respondents in 2024, with best-in-class teams at USD 2.78 and bottom-quartile teams at USD 12.88. APQC’s cross-industry median, reported in 2018, lands at USD 5.83, with top performers at USD 2.07 and bottom performers above USD 10.

Those are real numbers. But they are a floor, not the whole story.

The labour line is the smallest line

Start with what finance already tracks. At USD 9.40 per invoice and, say, 150,000 invoices a year, the labour-and-systems cost of AP is about USD 1.4M. That is a real number and worth optimising.

It is also a rounding error on everything else.

Rework and exceptions

Ardent’s 2024 data shows best-in-class AP teams have a 9 per cent invoice exception rate. Everyone else averages 22 per cent. That is the rate at which invoices cannot be processed straight-through because something is wrong. Missing PO number. Line items do not reconcile. Tax code mismatch. Supplier master data out of date.

The headline industry straight-through processing rate is just 32.6 per cent. Two out of three invoices, at the average enterprise, still need human intervention somewhere in the flow. Every one of those humans is doing investigative work: emailing suppliers, chasing approvers, reassigning cost centres. That is not the USD 9.40 line. That is the invisible line.

Late-payment penalties and relationship cost

In the UK, late payments cost the economy almost GBP 11 billion per year, trigger about 38 business closures every day, and drain around GBP 22,000 per year from the average SME. The consequences are asymmetric. Your suppliers wear the cash-flow pain. You wear the relationship cost, the fee-revision negotiation, and increasingly, the regulatory exposure.

In Australia, the Payment Times Reporting Scheme now publishes a register naming the slowest 20 per cent of small-business payers. The Payment Times Reporting Amendment Act 2024 put teeth on the scheme from 1 July 2024. If you are a large business and your AP flow quietly lets invoices age past their due date while exceptions get chased, you now have a public-policy exposure, not just a supplier-relationship one.

Missed early-payment discounts

The industry benchmark is that about a quarter of enterprises offered early-payment discounts actually capture them consistently. The rest leak the discount every cycle because the AP pipeline is too slow to clear approvals before the discount window closes.

A 2/10 net 30 discount is roughly a 36 per cent annualised return on paying ten days early. Missing it is not “free.” Missing it systematically on hundreds of thousands of invoices is a material P&L line that simply never shows up on a P&L.

Duplicate payments

APQC’s data on duplicate and erroneous disbursements shows top performers running at 0.8 per cent of annual disbursements. Bottom performers run at 2 per cent. On a AUD 200M annual spend, the bottom-quartile number is AUD 4M of overpayments a year. Some of it gets recovered. Most of it does not, because vendors do not volunteer.

Celonis’s process-mining engagement with Deutsche Telekom Services Europe (nine million invoices a year) documented EUR 3 million in annual savings from duplicate-payment prevention alone, on top of EUR 40 million from early-payment discounts. The vendor has obvious interests in those numbers, but the shape is consistent with every AP-forensics engagement we have seen.

Invoice fraud and BEC

This is the line that usually surprises people the most.

The FBI’s IC3 unit reports USD 55.5 billion in exposed global losses from Business Email Compromise between October 2013 and December 2023 across 305,033 incidents. Its 2024 Internet Crime Report logged a further USD 2.77 billion in BEC losses across 21,442 incidents in 2024 alone. BEC is the second-costliest cybercrime category the FBI tracks, and the single most common attack vector is the manipulated supplier invoice.

In Australia, the ACCC reported AUD 227 million in payment-redirection scam losses in 2021 alone, a 77 per cent increase on the prior year. Payment redirection is now the single most financially damaging scam class for Australian businesses.

Then there are the named incidents. Toyota Boshoku, USD 37 million lost to a vendor-impersonation attack in August 2019. The Puerto Rico Industrial Development Company, USD 2.6 million transferred to a fraudulent bank account on the strength of a single spoofed email in January 2020. These are not edge cases. They are what the attack looks like when it lands.

The Australian case worth knowing in detail: Mobius Group v Inoteq, where the Western Australian court effectively ruled that if your supplier’s email is compromised and you pay the fraudulent “new account” details, you can still be legally liable to pay the original invoice again. Your supplier’s security failure becomes your balance sheet problem.

Audit overhead and the Month 13 problem

Every AP team has a Month 13. The quiet four-to-six weeks after financial year-end where someone untangles the exceptions, reclassifies miscoded spend, chases missing POs, writes off disputed amounts, and reconciles the GL to reality.

Month 13 is a cost centre that never appears in the AP cost line. It shows up as “audit fees went up again” or “we had to extend the close window” or “the transformation team is understaffed because they are doing finance cleanup.” The PCAOB’s 2024 inspection cycle noted that testing controls with a review element, and identifying and selecting which controls to test, remain persistent deficiency areas. Which is another way of saying: auditors cannot easily verify that your AP controls actually do what you claim they do. That gap gets paid for in audit hours.

Why most AP automation projects still fail

If the prize is so large, why is the industry straight-through-processing rate still 32.6 per cent?

Because most AP automation projects solved the wrong problem.

The 2010s wave of AP automation was, in substance, OCR plus workflow. Receive the PDF, extract the fields with optical character recognition, route to an approver, book to the GL. On machine-readable invoices with predictable templates it worked reasonably well, hitting 85-95 per cent field-level accuracy. On the messy middle of real enterprise spend, layout variance kills it. Scanned paper with a coffee stain. Email-forwarded screenshots. New supplier with a template the system has never seen. Tax code on a different line than usual.

Every failed extraction becomes a human-touch exception. Every human-touch exception carries the full cost stack: investigation time, late-payment risk, discount-capture risk, approval-chain churn, and an audit trail that has to be patched together after the fact.

Ardent’s best-in-class teams are the ones who hit 69 per cent straight-through processing with 2.1 times the industry’s STP adoption and 2.4 times the likelihood of end-to-end automation. The bottom-quartile teams, on the same data, are not just slow. They are structurally fragile. Their 22 per cent exception rate is three times that of the best-in-class. Their 17.4-day cycle time is 5.6 times longer.

The gap is not tooling. It is architecture. Teams that treat AP as a data-quality and exception-handling problem, not a document-processing problem, get the benchmark results.

What agentic AP actually changes

An agent is not an OCR. Which is obvious. What is less obvious is what that means in practice.

An agentic AP system has three properties the prior generation did not.

One. It can read context, not just fields. The PO reference is missing, but the line items match a standing order from this supplier for the Chatswood depot and the amount is within the rolling-quarter variance envelope. An agent can make that call, log its reasoning, and flag for human review only if the confidence drops below a threshold. An OCR cannot.

Two. It can operate the tools. It can query the vendor master, cross-reference the invoice number against previously-paid records, check the BSB against the supplier’s historical accounts, run a duplicate check against the last rolling 180 days, confirm the tax line against the general ledger’s tax code map. Without any of that logic being hard-coded in a workflow tool. It is a tool-using agent, not a deterministic pipeline.

Three. It can explain itself. Which is the single most important property for audit. Every decision comes with a reasoning trace. Every tool call is logged. Every confidence score is visible. Every override is attributable. If the auditor asks “how did this invoice end up coded to capex rather than opex,” there is a durable answer.

The shape of the numbers when it works

The real-world case studies are largely vendor-published, so treat the specific numbers with the usual scepticism. But the shape is consistent.

HSB, a US housing cooperative processing 1.5 million invoices a year, moved from two-plus minutes per invoice to 45 seconds, a 72 per cent no-touch rate, 96 per cent coding accuracy, and 60,000 hours saved in year one against a 25,000-hour annual run-rate target. Vic.ai’s published case.

CNRG, a 145-store retailer running Microsoft Great Plains ERP, moved from five minutes to 1.2 minutes per invoice, 60 per cent no-touch rate, 90 per cent coding accuracy, with an 18-person AP team.

Purple (the mattress company) cut processing time from eight days to three on 1,100-plus invoices a month, eliminated duplicate payments, and moved documentation completeness from 25 per cent to 100 per cent. Stampli’s published case.

Discount these numbers 30 per cent for vendor selection bias and they still represent a category change. The shape is: cycle time down by half to two-thirds, no-touch rate up into the 60-75 per cent range, exception rate cut by more than half, coding accuracy up into the 90s.

But it only works if the governance layer is serious.

Governance, or the Month 13 problem returns

Two things kill agentic AP deployments on the governance side.

The first is poor ERP integration. An agent that codes invoices beautifully but cannot reliably write them to the ERP, reconcile against the PO, and post to the correct GL accounts is a demo, not a system. Integration work is 60-70 per cent of a serious agentic AP project. Teams that skip it end up with Month 13 problems worse than before, because the audit trail between the agent’s decision and the ledger entry is not clean.

The second is compliance exposure that was not modelled upfront.

For Australian organisations, that compliance stack now includes:

Internationally, the cross-border payments rail is shifting too. SWIFT’s ISO 20022 deadline lands in November 2025, ending the coexistence period for MT/MX messages on CBPR+ cross-border payment instructions. Richer remittance data is about to become the norm. Agentic AP systems that cannot ingest structured ISO 20022 remittance payloads are already technical debt.

The governance implication is simple. An agentic AP deployment has to make audit easier, not harder. That means immutable decision logs. Deterministic replay. Chain-of-thought traces attached to every exception. Role-based segregation between the agent’s coding authority and its payment-release authority. Human-in-the-loop for any action above a defined threshold. And a clear blast-radius boundary, so that a compromised credential or a poisoned vendor email cannot cascade into a mass payment event.

The Citibank reminder

In August 2020, an operator at Citibank working on a Revlon loan payment clicked the wrong option in Flexcube, the bank’s 1990s-vintage loan-servicing platform. Intended payment: USD 7.8 million in interest. Actual payment: USD 900 million, representing the entire outstanding principal. A three-person review process failed. Citi spent the next two years clawing back funds and settling with creditors for around USD 504 million. CEO Jane Fraser later described it as a “massive, unforced error.”

The Citibank incident is instructive because it was not an automation failure. It was a brittle-UI failure, where one human action at the critical point produced an irrecoverable outcome. That is exactly the failure mode agentic AP has to be designed against. The agent will make thousands of judgement calls a day. If any one of them can commit the firm to a USD 900M outcome with no deterministic check, the system is not production-ready. It is a Flexcube with a chatbot.

The bottom line

The headline number to optimise in AP is not cost per invoice. It is total cost per dollar of spend, including leakage.

Labour at USD 9.40 an invoice is real but small. Exception-handling rework is larger. Late-payment penalties, early-payment discount leakage, and duplicate payments are larger still. Invoice fraud, at industry-benchmark exposure rates, is often the single largest hidden line. Audit remediation and Month 13 tidy-up sit on top of all of it.

Agentic AP collapses the whole stack, because it can operate context, tools, and explanation in a way the OCR-plus-workflow generation could not. But it is not a drop-in upgrade. It needs a governance layer that makes audit easier, an integration layer that writes cleanly to the ERP, a threat model that treats the vendor-master-data change process as a critical security surface, and a human-in-the-loop policy sized to your actual risk appetite.

Teams that deploy it well will move from best-in-class being the aspiration to best-in-class being the baseline. Teams that deploy it poorly will discover a new class of Month 13 problem, with a new set of questions from their auditor about how the system made the decisions it made.

The question worth asking at your next budget cycle is not “what would AP automation cost us.” It is “what is our current AP stack actually costing us, fully loaded, including the lines we do not track.” Our experience is that the answer is usually three to five times the number in the deck.

Which means the business case for agentic AP is rarely close. The question is execution, not justification.

Written by Inference AI Consulting. If this problem is one you’re in the middle of, we’d rather hear about it than write about it.