How to Build a claims processing Agent Using LlamaIndex in Python for payments

By Cyprian AaronsUpdated 2026-04-21

claims-processingllamaindexpythonpayments

A claims processing agent for payments takes an incoming claim, pulls the right policy and transaction context, checks it against business rules, and drafts a decision or next action for a human reviewer. It matters because payment claims are high-volume, document-heavy, and sensitive to errors: one bad extraction or hallucinated rule can create compliance issues, chargeback losses, or customer disputes.

Architecture

•
Claim intake layer
- •Receives claim payloads from API, queue, or back office UI.
- •Normalizes fields like claimant ID, payment reference, amount, currency, timestamps, and dispute reason.
•
Document retrieval layer
- •Uses LlamaIndex to retrieve policy docs, payment terms, chargeback rules, SOPs, and prior claim notes.
- •Keeps the agent grounded in approved internal sources.
•
Decision engine
- •Combines retrieved context with structured claim data.
- •Produces outcomes like approve, reject, escalate, or request_more_info.
•
Audit and trace layer
- •Stores inputs, retrieved chunks, model output, and final decision.
- •Needed for compliance review and post-incident analysis.
•
Guardrail layer
- •Enforces PII redaction, jurisdiction checks, confidence thresholds, and human escalation.
- •Prevents the agent from making unsupported payment decisions.
•
Persistence layer
- •Stores claim state in a database.
- •Tracks versioned policy documents so decisions are reproducible.

Implementation

•
Index your policy and payment rules

Start with a small corpus: chargeback policy PDFs, refund SOPs, KYC/AML escalation rules, and payment network guidelines. In LlamaIndex, load documents with SimpleDirectoryReader, chunk them with SentenceSplitter, and build a VectorStoreIndex.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

# Configure the model used by LlamaIndex
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)

documents = SimpleDirectoryReader("./payment_policies").load_data()

splitter = SentenceSplitter(chunk_size=800, chunk_overlap=120)
nodes = splitter.get_nodes_from_documents(documents)

index = VectorStoreIndex(nodes)
query_engine = index.as_query_engine(similarity_top_k=4)

response = query_engine.query(
    "What is the approval rule for card-not-present claims under $100?"
)
print(response)

•
Wrap retrieval around structured claim data

Claims are not just text. You need a structured payload from your payments system plus retrieved policy context. The pattern is: parse the claim JSON first, retrieve relevant policy text second, then ask the LLM to produce a constrained decision.

import json
from dataclasses import dataclass
from typing import Literal

@dataclass
class Claim:
    claim_id: str
    payment_id: str
    amount: float
    currency: str
    reason_code: str
    country: str
    customer_note: str

claim_payload = """
{
  "claim_id": "CLM-10021",
  "payment_id": "PAY-88421",
  "amount": 89.50,
  "currency": "USD",
  "reason_code": "duplicate_charge",
  "country": "US",
  "customer_note": "I was charged twice for the same subscription."
}
"""

claim = Claim(**json.loads(claim_payload))

context = query_engine.query(
    f"""
    Find the applicable policy for:
    reason_code={claim.reason_code}
    amount={claim.amount}
    country={claim.country}
    """
)

prompt = f"""
You are a claims processing assistant for payments.
Use only the provided policy context.

Claim:
{claim}

Policy context:
{context}

Return one of: approve, reject, escalate.
Include a short justification citing the policy context.
"""

decision = Settings.llm.complete(prompt)
print(decision.text)

•
Add an explicit workflow boundary

Don’t let the model directly mutate payment records. Use it to recommend an action; your application code should validate that recommendation before anything is written to your ledger or case management system. In production I usually keep this as a simple service method that returns a typed result.

from pydantic import BaseModel

class Decision(BaseModel):
    action: Literal["approve", "reject", "escalate"]
    justification: str

def process_claim(claim: Claim) -> Decision:
    ctx = query_engine.query(
        f"{claim.reason_code} {claim.amount} {claim.currency} {claim.country}"
    )

    prompt = f"""
    Decide on this payment claim using only the context below.

    Claim JSON:
    {claim.model_dump_json() if hasattr(claim, "model_dump_json") else claim.__dict__}

    Policy Context:
    {ctx}

    Output JSON with keys: action, justification.
    """
    raw = Settings.llm.complete(prompt).text

    # In production use strict JSON parsing + schema validation here.
    # Keep this example focused on the LlamaIndex pattern.
    return Decision(action="escalate", justification=raw[:500])

result = process_claim(claim)
print(result.model_dump())

•
Track auditability from day one

Every claim decision needs traceability: which documents were retrieved, what prompt was used, what model answered, and who approved the final action if escalation happened. LlamaIndex gives you retrieval primitives; your application should persist those artifacts alongside the claim record in Postgres or your case store.

Production Considerations

•
Data residency
- •Keep policy indexes and embeddings inside the required region.
- •If claims contain personal data or bank account details, don’t ship raw text across borders unless legal has signed off.
•
Compliance controls
- •Redact PANs, IBANs, account numbers, and addresses before sending text to an LLM.
- •Log every retrieval hit and every final action for audit trails tied to SOC2/PCI/ISO controls.
•
Human-in-the-loop thresholds
- •Auto-approve only low-risk cases with clear policy matches.
- •Escalate anything involving fraud indicators, sanctions screening hits, high-value disputes, or ambiguous evidence.
•
Monitoring
- •Track retrieval quality metrics like top-k hit rate and empty-context queries.
- •Alert on spikes in escalations, rejected claims overturned by humans later on rejections that cite weak context.

Common Pitfalls

•
Letting the model decide without grounding
- •If you prompt an LLM from raw claim text alone, it will invent rules when context is missing.
- •Fix it by forcing retrieval from indexed policies first and rejecting responses that lack citations or supporting chunks.
•
Using unbounded free-text outputs in downstream systems
- •A string like “approve probably” should never hit your payments ledger.
- •Fix it with strict schemas such as Pydantic models plus application-side validation before any state change.
•
Mixing operational data with training-style corpora
- •Don’t dump live customer claims into your static policy index.
- •Fix it by separating immutable reference docs from transactional case data so you can rotate or delete PII without rebuilding everything from scratch.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit