How to Build a claims processing Agent Using LlamaIndex in Python for payments
A claims processing agent for payments takes an incoming claim, pulls the right policy and transaction context, checks it against business rules, and drafts a decision or next action for a human reviewer. It matters because payment claims are high-volume, document-heavy, and sensitive to errors: one bad extraction or hallucinated rule can create compliance issues, chargeback losses, or customer disputes.
Architecture
- •
Claim intake layer
- •Receives claim payloads from API, queue, or back office UI.
- •Normalizes fields like claimant ID, payment reference, amount, currency, timestamps, and dispute reason.
- •
Document retrieval layer
- •Uses LlamaIndex to retrieve policy docs, payment terms, chargeback rules, SOPs, and prior claim notes.
- •Keeps the agent grounded in approved internal sources.
- •
Decision engine
- •Combines retrieved context with structured claim data.
- •Produces outcomes like
approve,reject,escalate, orrequest_more_info.
- •
Audit and trace layer
- •Stores inputs, retrieved chunks, model output, and final decision.
- •Needed for compliance review and post-incident analysis.
- •
Guardrail layer
- •Enforces PII redaction, jurisdiction checks, confidence thresholds, and human escalation.
- •Prevents the agent from making unsupported payment decisions.
- •
Persistence layer
- •Stores claim state in a database.
- •Tracks versioned policy documents so decisions are reproducible.
Implementation
- •
Index your policy and payment rules
Start with a small corpus: chargeback policy PDFs, refund SOPs, KYC/AML escalation rules, and payment network guidelines. In LlamaIndex, load documents with
SimpleDirectoryReader, chunk them withSentenceSplitter, and build aVectorStoreIndex.
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
# Configure the model used by LlamaIndex
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
documents = SimpleDirectoryReader("./payment_policies").load_data()
splitter = SentenceSplitter(chunk_size=800, chunk_overlap=120)
nodes = splitter.get_nodes_from_documents(documents)
index = VectorStoreIndex(nodes)
query_engine = index.as_query_engine(similarity_top_k=4)
response = query_engine.query(
"What is the approval rule for card-not-present claims under $100?"
)
print(response)
- •
Wrap retrieval around structured claim data
Claims are not just text. You need a structured payload from your payments system plus retrieved policy context. The pattern is: parse the claim JSON first, retrieve relevant policy text second, then ask the LLM to produce a constrained decision.
import json
from dataclasses import dataclass
from typing import Literal
@dataclass
class Claim:
claim_id: str
payment_id: str
amount: float
currency: str
reason_code: str
country: str
customer_note: str
claim_payload = """
{
"claim_id": "CLM-10021",
"payment_id": "PAY-88421",
"amount": 89.50,
"currency": "USD",
"reason_code": "duplicate_charge",
"country": "US",
"customer_note": "I was charged twice for the same subscription."
}
"""
claim = Claim(**json.loads(claim_payload))
context = query_engine.query(
f"""
Find the applicable policy for:
reason_code={claim.reason_code}
amount={claim.amount}
country={claim.country}
"""
)
prompt = f"""
You are a claims processing assistant for payments.
Use only the provided policy context.
Claim:
{claim}
Policy context:
{context}
Return one of: approve, reject, escalate.
Include a short justification citing the policy context.
"""
decision = Settings.llm.complete(prompt)
print(decision.text)
- •
Add an explicit workflow boundary
Don’t let the model directly mutate payment records. Use it to recommend an action; your application code should validate that recommendation before anything is written to your ledger or case management system. In production I usually keep this as a simple service method that returns a typed result.
from pydantic import BaseModel
class Decision(BaseModel):
action: Literal["approve", "reject", "escalate"]
justification: str
def process_claim(claim: Claim) -> Decision:
ctx = query_engine.query(
f"{claim.reason_code} {claim.amount} {claim.currency} {claim.country}"
)
prompt = f"""
Decide on this payment claim using only the context below.
Claim JSON:
{claim.model_dump_json() if hasattr(claim, "model_dump_json") else claim.__dict__}
Policy Context:
{ctx}
Output JSON with keys: action, justification.
"""
raw = Settings.llm.complete(prompt).text
# In production use strict JSON parsing + schema validation here.
# Keep this example focused on the LlamaIndex pattern.
return Decision(action="escalate", justification=raw[:500])
result = process_claim(claim)
print(result.model_dump())
- •
Track auditability from day one
Every claim decision needs traceability: which documents were retrieved, what prompt was used, what model answered, and who approved the final action if escalation happened. LlamaIndex gives you retrieval primitives; your application should persist those artifacts alongside the claim record in Postgres or your case store.
Production Considerations
- •
Data residency
- •Keep policy indexes and embeddings inside the required region.
- •If claims contain personal data or bank account details, don’t ship raw text across borders unless legal has signed off.
- •
Compliance controls
- •Redact PANs, IBANs, account numbers, and addresses before sending text to an LLM.
- •Log every retrieval hit and every final action for audit trails tied to SOC2/PCI/ISO controls.
- •
Human-in-the-loop thresholds
- •Auto-approve only low-risk cases with clear policy matches.
- •Escalate anything involving fraud indicators, sanctions screening hits, high-value disputes, or ambiguous evidence.
- •
Monitoring
- •Track retrieval quality metrics like top-k hit rate and empty-context queries.
- •Alert on spikes in escalations, rejected claims overturned by humans later on rejections that cite weak context.
Common Pitfalls
- •
Letting the model decide without grounding
- •If you prompt an LLM from raw claim text alone, it will invent rules when context is missing.
- •Fix it by forcing retrieval from indexed policies first and rejecting responses that lack citations or supporting chunks.
- •
Using unbounded free-text outputs in downstream systems
- •A string like “approve probably” should never hit your payments ledger.
- •Fix it with strict schemas such as Pydantic models plus application-side validation before any state change.
- •
Mixing operational data with training-style corpora
- •Don’t dump live customer claims into your static policy index.
- •Fix it by separating immutable reference docs from transactional case data so you can rotate or delete PII without rebuilding everything from scratch.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit