How to Build a fraud detection Agent Using LlamaIndex in Python for pension funds

By Cyprian AaronsUpdated 2026-04-21

fraud-detectionllamaindexpythonpension-funds

A fraud detection agent for pension funds ingests member activity, transaction events, document changes, and case notes, then flags patterns that look inconsistent with normal retirement behavior. It matters because pension fraud is usually low-volume but high-impact: unauthorized withdrawals, identity takeover, beneficiary changes, and advisor abuse can drain member accounts and create regulatory exposure fast.

Architecture

•
Data ingestion layer
- •Pulls structured events from payments, CRM, policy admin, and document systems.
- •Normalizes records into a common schema with member ID, event type, timestamp, source system, and risk signals.
•
LlamaIndex retrieval layer
- •Uses VectorStoreIndex for unstructured evidence like call transcripts, emails, scanned forms, and investigator notes.
- •Uses metadata filters to scope retrieval to a specific member, account, or case.
•
Fraud reasoning agent
- •Uses ReActAgent or FunctionCallingAgentWorker to combine retrieval with rule-based checks.
- •Produces a risk score plus a short explanation tied to evidence.
•
Case management output
- •Writes alerts into a queue or case system with supporting citations.
- •Keeps the output structured so investigators can review it without reading raw model output.
•
Audit and compliance store
- •Stores prompts, retrieved chunks, model version, and final decisions.
- •Required for pension fund auditability and post-incident review.

Implementation

1. Load pension fund evidence into LlamaIndex

For fraud detection, you do not want a generic chatbot over all documents. You want scoped retrieval by member or account so the agent only reasons over relevant evidence.

from llama_index.core import Document, VectorStoreIndex
from llama_index.core import Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)

docs = [
    Document(
        text="Member requested change of bank account on 2024-11-02 via call center.",
        metadata={"member_id": "M1001", "source": "call_center", "event_type": "bank_change"}
    ),
    Document(
        text="Two withdrawal requests submitted within 18 hours using new bank details.",
        metadata={"member_id": "M1001", "source": "payments", "event_type": "withdrawal"}
    ),
    Document(
        text="Beneficiary update completed after email domain changed from corporate to public provider.",
        metadata={"member_id": "M1002", "source": "crm", "event_type": "beneficiary_change"}
    ),
]

index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine(similarity_top_k=3)

This gives you a retrieval layer that can pull the strongest evidence for a suspicious pattern. In production, the documents usually come from Kafka topics, S3 exports, or your case management database.

2. Add a fraud-focused tool around retrieval

The agent should not “guess” fraud. It should retrieve evidence and apply explicit checks like rapid bank detail changes, unusual withdrawal timing, or beneficiary edits after contact detail updates.

from llama_index.core.tools import FunctionTool
from typing import List

def get_member_evidence(member_id: str) -> str:
    response = query_engine.query(
        f"Retrieve all suspicious activity for member {member_id}"
    )
    return str(response)

evidence_tool = FunctionTool.from_defaults(
    fn=get_member_evidence,
    name="get_member_evidence",
    description="Fetch relevant fraud-related evidence for a pension fund member"
)

This pattern keeps the agent grounded in retrieved facts. It also makes it easier to audit what data was used in each decision.

3. Build the agent with explicit instructions

Use an agent that can call tools and produce structured reasoning. For this use case, ReActAgent is practical because it makes tool usage visible in traces.

from llama_index.core.agent import ReActAgent

system_prompt = """
You are a fraud detection analyst for a pension fund.
Only use retrieved evidence from tools.
Flag risks such as identity takeover, bank detail changes before withdrawals,
beneficiary manipulation, duplicate claims, and unusual access patterns.
Return:
1) risk_level: low/medium/high
2) reasons: bullet list
3) recommended_action: review / freeze / escalate
"""

agent = ReActAgent.from_tools(
    tools=[evidence_tool],
    llm=Settings.llm,
    system_prompt=system_prompt,
    verbose=True,
)

Now query it with a concrete case:

result = agent.chat("Assess member M1001 for possible fraud.")
print(result)

In practice you would wrap this in an API endpoint or background worker. The output should be converted into JSON before pushing it into your case system.

4. Return structured alerts for investigators

Investigators need consistent fields they can filter on. Do not send free-form prose only; include severity, rationale, and traceability fields.

import json

def assess_member(member_id: str):
    response = agent.chat(f"Assess member {member_id} for possible fraud.")
    alert = {
        "member_id": member_id,
        "risk_level": "high" if "high" in str(response).lower() else "medium",
        "summary": str(response),
        "model": Settings.llm.model if hasattr(Settings.llm, "model") else "unknown",
    }
    return json.dumps(alert)

print(assess_member("M1001"))

That JSON payload can go straight into Kafka, Service Bus, or your case management platform. Keep the raw model output alongside the normalized alert for audit purposes.

Production Considerations

•
Data residency
- •Pension data often cannot leave a specific jurisdiction.
- •Host embeddings, indexes, and LLM endpoints in-region; do not send member records to external services without legal approval.
•
Auditability
- •Store every prompt, retrieved chunk IDs, model version, and final decision.
- •Regulators will ask why an alert was raised or why one was missed.
•
Guardrails
- •Restrict tools so the agent can only read approved sources.
- •Block direct write access to payment systems; the agent should recommend actions, not execute them.
•
Monitoring
- •Track false positives by alert type: bank change abuse, beneficiary manipulation, identity mismatch.
- •Monitor latency too; pension operations teams will reject an agent that delays claim processing during peak periods.

Common Pitfalls

•
Using broad retrieval across all members
- •This leaks irrelevant context into the prompt and increases false positives.
- •Fix it by filtering on member_id, case ID, date range, and event type before retrieval.
•
Letting the model make unsupported claims
- •If the model says “fraud likely” without citations from source records, investigators will not trust it.
- •Force answers to reference retrieved evidence only and reject outputs without supporting chunks.
•
Skipping compliance controls
- •Pension funds need retention policies, access controls, and explainability.
- •Log everything needed for audit review and keep human approval in the loop for freezes or escalations.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit