How to Build a customer support Agent Using LlamaIndex in Python for wealth management

By Cyprian AaronsUpdated 2026-04-21

customer-supportllamaindexpythonwealth-management

A customer support agent for wealth management answers client questions about portfolios, account documents, fees, tax forms, transfer status, and firm policies without exposing sensitive data or hallucinating advice. It matters because support in this domain is not just about speed; it has to stay compliant, produce auditable answers, and route anything advisory or high-risk to the right human.

Architecture

•
Document ingestion layer
- •Pulls from approved sources like FAQs, policy PDFs, product sheets, fee schedules, and client service manuals.
- •Keep source metadata: document type, version, jurisdiction, and last reviewed date.
•
Indexing layer
- •Uses VectorStoreIndex over curated support content.
- •Stores chunks with metadata filters so the agent can distinguish retail vs. HNW vs. institutional policies.
•
Retriever
- •Uses VectorIndexRetriever or the index’s retriever interface to fetch only relevant passages.
- •Should be constrained by business unit and region to respect data residency and policy boundaries.
•
Response synthesis
- •Uses a QueryEngine or RetrieverQueryEngine to turn retrieved context into a grounded answer.
- •The prompt must force citation of source snippets and refusal when the answer touches regulated advice.
•
Guardrails layer
- •Adds policy checks for PII leakage, investment advice requests, and unsupported actions like trade execution.
- •Routes edge cases to a human queue with conversation context attached.
•
Audit and observability
- •Logs question, retrieved documents, final answer, confidence signals, and escalation reason.
- •Required for compliance review and post-incident analysis.

Implementation

1) Load approved support content with metadata

Use only controlled sources. For wealth management support agents, I recommend separating documents by region and product line before indexing.

from llama_index.core import Document

docs = [
    Document(
        text="Clients can request duplicate statements from the secure portal. Processing takes 1-2 business days.",
        metadata={
            "source": "client_service_faq",
            "doc_type": "faq",
            "region": "us",
            "business_line": "wealth_management",
            "version": "2025-01"
        },
    ),
    Document(
        text="Fee waivers may apply only under approved relationship tiers and must be reviewed by operations.",
        metadata={
            "source": "fee_policy",
            "doc_type": "policy",
            "region": "us",
            "business_line": "wealth_management",
            "version": "2025-01"
        },
    ),
]

If your content lives in PDFs or SharePoint, parse it upstream and normalize it into Document objects before indexing.

2) Build the index and retriever

For a support agent, VectorStoreIndex is usually enough if your corpus is small to medium and well curated. Add metadata filters so the agent does not answer from the wrong jurisdiction or product set.

from llama_index.core import VectorStoreIndex
from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter

index = VectorStoreIndex.from_documents(docs)

filters = MetadataFilters(filters=[
    ExactMatchFilter(key="region", value="us"),
    ExactMatchFilter(key="business_line", value="wealth_management"),
])

retriever = index.as_retriever(similarity_top_k=3)

If you need stricter retrieval by region at query time, use a vector store that supports metadata filtering natively. The important part is that your retrieval path respects residency and policy boundaries before generation starts.

3) Create a grounded query engine with citations

This is the core pattern. The model should answer only from retrieved context and say when it cannot help.

from llama_index.core import Settings
from llama_index.core.llms import MockLLM
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.prompts import PromptTemplate

# Replace MockLLM with your production LLM provider.
Settings.llm = MockLLM()

qa_prompt = PromptTemplate(
    """You are a wealth management customer support agent.
Use only the provided context to answer.
If the answer is not in the context, say you do not have enough information.
Do not provide investment advice or tax advice.
If the user asks for trading actions or personalized recommendations, escalate to a human.

Context:
{context_str}

Question:
{query_str}

Answer:
"""
)

query_engine = RetrieverQueryEngine.from_args(
    retriever=retriever,
    response_synthesizer=None,
    text_qa_template=qa_prompt,
)

response = query_engine.query("How long does it take to get duplicate statements?")
print(response)

In production, swap MockLLM() for your real provider integration. The pattern stays the same: retrieve first, then synthesize from evidence only.

4) Add a lightweight escalation check

Wealth management support needs clear refusal boundaries. A simple classifier step can catch advisory requests before they hit generation.

def needs_human_escalation(user_text: str) -> bool:
    blocked_phrases = [
        "what should i invest in",
        "best stock",
        "tax loss harvesting for me",
        "should i sell",
        "guaranteed return",
    ]
    text = user_text.lower()
    return any(phrase in text for phrase in blocked_phrases)

user_question = "Should I sell my tech stocks now?"
if needs_human_escalation(user_question):
    print("Escalate to licensed advisor or service specialist.")
else:
    print(query_engine.query(user_question))

This is intentionally simple. In a real deployment you would combine rules with an LLM-based intent classifier and log every escalation reason for audit.

Production Considerations

•
Deployment
- •Keep document storage and vector storage in-region if you serve clients under residency constraints.
- •Separate indexes by jurisdiction or legal entity instead of mixing everything into one global corpus.
•
Monitoring
- •Track retrieval hit rate, escalation rate, unsupported-question rate, and citation coverage.
- •Alert on answers generated with low retrieval confidence or empty context.
•
Guardrails
- •Block personalized investment recommendations, tax advice, legal advice, and trade instructions unless explicitly routed to an authorized advisor workflow.
- •Redact account numbers, SSNs/NINs, emails, and phone numbers before logging prompts or responses.
•
Auditability
- •Store question text, retrieved chunk IDs, source document versions, model name, timestamp, and final response.
- •When compliance reviews an answer six months later, you need to reproduce exactly what sources were used.

Common Pitfalls

•
Indexing unapproved content
- •Don’t dump CRM notes or raw tickets into the index without review.
- •Fix it by curating an approved knowledge base with versioned documents and explicit ownership.
•
Letting retrieval cross policy boundaries
- •A US client should not receive guidance pulled from UK fee policies or EMEA disclosures.
- •Fix it with metadata filters on region, entity, language, and product line at retrieval time.
•
Treating support like advice
- •If the agent starts recommending funds or timing trades based on user profile data, you have crossed into regulated territory.
- •Fix it by hardcoding escalation rules for advisory intents and forcing concise refusal language in the system prompt.
•
Logging sensitive data blindly
- •Debug logs often become a compliance problem faster than the model itself.
- •Fix it by redacting PII before persistence and keeping raw conversation access tightly controlled.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit