How to evaluate AI AR vendors: a buyer's checklist
A practical checklist for evaluating AI accounts receivable vendors. Learn the questions that separate real agentic AR from rebranded automation, and how to score vendors objectively.
Evaluating AI AR vendors comes down to one test: does the system do the work and own the result, or does it surface work for your team to do and call that AI? Most tools marketed as AI accounts receivable are rule-based dunning and reporting with a new label. The few that are genuinely agentic act on accounts on their own and are accountable for an outcome you can measure.
This checklist gives you the questions, proof points, and scoring method to tell the difference. Rex is one option that meets the bar described here, but the point of this guide is to help you judge any vendor honestly, including us.
Separate agentic from assistive claims
Start by sorting every vendor into one of two buckets. Assistive tools make your team faster: they draft the email, suggest the match, flag the dispute, and wait for a person to approve and send. Agentic systems do the work: they send the email, apply the cash, and drive the dispute to resolution, escalating only the cases that need a human decision.
Both are legitimate, but they are not the same purchase. An assistive tool still needs a person for every action, so your cost still scales with your book. An agentic system breaks that link. Decide which one you are actually buying before you compare features, because a cheaper assistive tool and a pricier agentic one are not competing on the same value.
The fastest way to place a vendor is to ask what happens when nobody is watching. If the answer is "the queue fills up for your team," it is assistive. If the answer is "the agent keeps working the ledger and escalates the exceptions," it is agentic.
Watch for the middle ground too, because it is where most of the confusion lives. Plenty of tools automate a single step well, like sending dunning emails on a schedule, then call the whole product AI because that one step runs on its own. That is automation of a fixed task, not an agent reasoning across an account. The test is breadth and judgment: an agent decides what to do next on each account from its full context, where a workflow tool runs the same fixed steps regardless of what the account actually needs.
Questions that expose rebranded automation
Demos are built to impress. These questions pull a vendor off the script.
- What does the system do with no human in the loop? Make them list the specific actions it takes on its own, end to end, on a real account.
- Show me an edge case it was not configured for. A customer reply that mixes a complaint, a dispute, and a partial promise to pay. Watch whether it reasons or asks for a new rule.
- How does it decide the next action on an account? "Rules and templates we set up together" is automation. "It reads the account state and chooses" is agentic, and they should be able to show the reasoning.
- What happens when a customer replies? Does the agent read the reply, decide, and respond, or does it route the message to a human inbox?
- How much configuration does a new scenario take? If every new situation needs a workflow built, you are buying a workflow tool, not an agent.
If answers keep returning to dashboards, alerts, and worklists, the system surfaces work rather than doing it. That can still be useful. Just price it accordingly.
Outcome accountability and SLAs
The clearest signal of a real agentic vendor is willingness to be measured on outcomes. Ask what metric the system is accountable for. Strong answers are concrete: cash recovered, DSO reduction, percentage of the ledger worked each week, straight-through cash application rate. Weak answers describe activity, like emails sent or reports generated, which tells you nothing about results.
Then ask to put it in the agreement. You will not always get an outcome-based contract, and that is fine, but the conversation is revealing. A vendor confident in an autonomous system will discuss SLAs around response time, action coverage, and recovery. A vendor selling software steers you back to per-seat licensing, because their value is access to a tool, not a result. If you want to pressure-test the numbers a vendor quotes, model them yourself first using a defensible AI ROI in accounts receivable framework.
Be honest with yourself here too. Outcomes depend partly on your data and your customers, so no serious vendor will guarantee a number in a vacuum. What you are testing is whether they think in outcomes at all.
Integration, security, and compliance checks
An AR agent is only as good as its access to your systems. It has to read and write where the work actually lives.
- ERP and accounting writeback. Can it post actions back to NetSuite, QuickBooks, Sage, or your system of record, not just read from it? Read-only means a human still does the data entry.
- Email, payments, and portals. Does it send and receive on real channels, including the AP portals where many B2B invoices live?
- Permissions and approval gates. Can you set thresholds so high-stakes actions need sign-off? Autonomy without limits is a non-starter in finance.
- Audit trail. Every action the agent takes should be logged with the reasoning behind it, so you can review what it did and why.
- Data security and access control. SOC 2 at minimum, clear data handling, and scoped access to only what the agent needs.
Treat these as pass or fail. A vendor that cannot write back to your ERP or cannot show you an audit trail is not ready to run autonomously, whatever the demo looked like.
Proof: pilots, references, and metrics
Marketing claims are free. Make vendors prove autonomy on your data.
Run a scoped pilot on a real slice of your ledger, not a sandbox. Before it starts, agree on the success metric and the escalation rules in writing. During the pilot, the real test is the audit trail: did the agent take the actions, or did your team quietly do the work behind it? Check who is attributed to each action.
Ask for references who run the system autonomously today, and ask them what the agent does without supervision and what it escalates. The most useful reference question is simple: what does your team no longer do since the agent went live? If the honest answer is "nothing, we still send every email," the autonomy was a demo, not a deployment. Press for metrics with context: a DSO drop is meaningless without the starting point, the book size, and the time frame. Vague case studies that cannot produce a customer on a call are a flag.
Be wary of pilots that quietly depend on the vendor's staff. Some vendors run a "pilot" by having their own people work your accounts behind a thin layer of software, so the results look autonomous but would not survive at scale. The audit-trail attribution check catches this: confirm the actions are attributed to the agent, not to a vendor operator logging in on your behalf.
Scoring vendors objectively
Turn all of this into a simple scorecard so you compare on the same axes instead of on demo polish. Weight the categories by what matters to you, then score each vendor one to five.
- Autonomy. Does it act end to end without a human, or surface work?
- Outcome accountability. Is it measured on cash and DSO, and will it stand behind that?
- Edge-case reasoning. Did it handle a novel case live without new rules?
- Integration depth. Real writeback to your ERP and channels, or read-only?
- Control and auditability. Approval gates, logs, and explainability by default.
- Security and compliance. Certifications and data handling that pass your review.
- Proof. A successful pilot on your data and credible references.
The vendor that wins should score well on autonomy and accountability, not just on the longest feature list. A tool can have every feature and still leave the actual work on your team.
How Rex fits this checklist
Rex is an agentic AI AR agent. It runs collections, cash application, and dispute resolution across your whole ledger on its own, takes the actions in your ERP and over email, and is accountable for the outcomes it is measured on, cash recovered and DSO down. It escalates the cases that need a human decision and logs every action with its reasoning, so you can supervise without doing the work. When you run the checklist above, Rex is built to pass the autonomy, accountability, and auditability tests, and we expect you to verify that in a pilot on your own accounts rather than take our word for it.
Use this checklist on every vendor you consider, Rex included, and let the pilot decide.
Frequently asked questions
- What is the most important question to ask an AI AR vendor?
- Ask what the system does without a human, and what outcome it is accountable for. A real agentic vendor will name the actions it takes on its own and the metric it is measured on, like cash recovered or DSO. A weaker vendor will describe what it shows you so you can act.
- How can I tell if AI collections software is really agentic or just rule-based automation?
- Give it a messy case it was not configured for, such as a customer reply that mixes a dispute with a partial payment promise. A rule engine needs a new rule. An agent reads the context, decides, and acts. Ask to watch it handle an edge case live in a pilot, not in a scripted demo.
- Should outcome accountability be in the contract?
- If a vendor claims to own outcomes, ask to tie part of the commercial terms to them. Vendors confident in their system will discuss SLAs around response time, action coverage, or recovery. Vendors selling a dashboard will steer you back to seat licenses and features.
- What should an AI AR pilot prove?
- That the system acts autonomously on real accounts, escalates the right cases, and moves a metric you care about within the pilot window. Define the success metric and the escalation rules before you start, then check the audit trail to confirm the agent, not your team, did the work.