How to Build a claims processing Agent Using LangChain in Python for pension funds

By Cyprian AaronsUpdated 2026-04-21

claims-processinglangchainpythonpension-funds

A claims processing agent for pension funds reads incoming claim documents, extracts the relevant facts, checks them against policy rules, flags missing evidence, and routes the case to the right human reviewer when needed. It matters because pension claims are high-trust, high-compliance workflows: a bad decision can create regulatory exposure, delay member payments, or trigger audit findings.

Architecture

•
Document ingestion layer
- •Pulls PDFs, scans, emails, and structured forms from secure storage.
- •Normalizes content into text chunks with source metadata.
•
Extraction chain
- •Uses an LLM to extract claim fields like member ID, benefit type, date of exit, dependents, and supporting evidence.
- •Returns structured JSON so downstream logic is deterministic.
•
Rules and eligibility engine
- •Applies pension-specific checks outside the model.
- •Examples: vesting status, age thresholds, contribution history, death benefit requirements.
•
Retrieval layer
- •Pulls policy documents, scheme rules, and jurisdiction-specific regulations from a vector store or document index.
- •Keeps the agent grounded in current plan rules.
•
Decision router
- •Decides whether to auto-approve low-risk claims or escalate to a case manager.
- •Uses confidence thresholds and business rules.
•
Audit and observability layer
- •Stores prompts, outputs, citations, and final decisions.
- •Needed for compliance review and dispute handling.

Implementation

1) Load pension scheme documents into retrieval

For pension funds, the agent should not guess policy. It should retrieve scheme rules first and answer only from approved documents.

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

loader = PyPDFLoader("scheme_rules.pdf")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=120)
chunks = splitter.split_documents(docs)

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

2) Build a structured extraction chain

Use ChatPromptTemplate, JsonOutputParser, and PydanticOutputParser patterns to force structure. For claims processing, this is better than free-form text because downstream code needs stable fields.

from pydantic import BaseModel, Field
from typing import Optional
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import JsonOutputParser

class ClaimExtraction(BaseModel):
    member_id: str = Field(description="Pension member identifier")
    claim_type: str = Field(description="retirement|death|withdrawal|disability")
    date_of_claim: str = Field(description="ISO date string")
    evidence_received: bool
    missing_items: list[str]
    notes: Optional[str] = None

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "Extract claim data from the document. Return only valid JSON."),
    ("human", "{claim_text}")
])

parser = JsonOutputParser(pydantic_object=ClaimExtraction)

extraction_chain = prompt | llm | parser

sample_text = """
Member ID: PF-10291
Claim type: retirement
Date of claim: 2026-03-18
Attached: ID copy and bank confirmation
Missing: proof of termination from employer
"""

result = extraction_chain.invoke({"claim_text": sample_text})
print(result)

3) Add retrieval-grounded eligibility checking

The next step is to combine extracted facts with scheme rules. Use a retrieval chain to fetch relevant clauses before making any recommendation.

from langchain_core.prompts import PromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

eligibility_prompt = PromptTemplate.from_template("""
You are assessing a pension claim against scheme rules.
Use only the provided context.

Context:
{context}

Claim:
{claim_json}

Return:
1. eligibility_status: approved|needs_review|rejected
2. reason
3. citations: list of rule references used
""")

combine_docs_chain = create_stuff_documents_chain(llm, eligibility_prompt)
retrieval_chain = create_retrieval_chain(retriever, combine_docs_chain)

eligibility_result = retrieval_chain.invoke({
    "input": "retirement claim for member PF-10291",
    "claim_json": result
})

print(eligibility_result["answer"])

4) Route low-risk claims and escalate everything else

In production you want deterministic routing. If required evidence is missing or the model confidence is weak, send it to a human reviewer. That keeps the agent useful without letting it make unsupported decisions.

def route_claim(extracted_claim: dict) -> str:
    if extracted_claim.get("missing_items"):
        return "human_review"
    if extracted_claim.get("claim_type") not in {"retirement", "death", "withdrawal", "disability"}:
        return "human_review"
    return "auto_process"

route = route_claim(result)
print({"route": route})

Production Considerations

•
Auditability
- •Persist every input document hash, retrieved clause, prompt version, model version, output JSON, and final human override.
- •Pension administrators need traceable decisions during audits and disputes.
•
Data residency
- •Keep member data in-region if your pension fund operates under local residency requirements.
- •Use private networking and approved model endpoints; do not ship sensitive claims data across uncontrolled regions.
•
Guardrails
- •Force structured outputs with Pydantic models.
- •Reject any response that omits required fields like member ID or cites no rule references.
- •Add a human approval step for death benefits, disability claims, exceptions, and ambiguous identity matches.
•
Monitoring
- •Track extraction accuracy, escalation rate, average handling time, and override frequency by claim type.
- •Alert on sudden spikes in rejected claims or missing-document rates; those often indicate upstream form issues or policy drift.

Common Pitfalls

•
Letting the model decide eligibility from memory
- •Avoid this by retrieving scheme rules every time and grounding the answer in documents.
- •Pension policies change; stale reasoning is a compliance problem.
•
Using unstructured outputs
- •Free-form responses are hard to validate and impossible to automate safely.
- •Use JsonOutputParser or Pydantic-backed parsers so your downstream workflow can enforce schema checks.
•
Skipping human review for edge cases
- •Claims involving beneficiaries, deceased members, cross-border transfers, or missing employer records need escalation.
- •Build explicit routing rules instead of trusting confidence scores alone.
•
Ignoring audit trails
- •If you cannot reconstruct why a claim was approved or rejected, you do not have a production system.
- •Store retrieval context and model outputs alongside case metadata for every decision path.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit