How to Build a customer support Agent Using LlamaIndex in Python for retail banking
A retail banking customer support agent answers routine questions, retrieves policy-backed information, and routes sensitive requests without exposing the bank to unnecessary risk. It matters because most support volume is repetitive, but the failure modes are expensive: wrong fee explanations, bad transfer guidance, and weak handling of PII can create compliance and trust problems fast.
Architecture
- •
Channel adapter
- •Receives chat messages from web, mobile, or contact-center tooling.
- •Normalizes user input into a single request format.
- •
Knowledge index
- •Stores product FAQs, fee schedules, dispute policies, branch hours, and support playbooks.
- •Built with
VectorStoreIndexover approved internal documents.
- •
Retriever
- •Uses
index.as_retriever()to fetch only relevant policy chunks. - •Keeps answers grounded in bank-approved content.
- •Uses
- •
Response synthesizer / query engine
- •Uses
index.as_query_engine()or a customRetrieverQueryEngine. - •Produces concise answers with citations for auditability.
- •Uses
- •
Guardrails layer
- •Detects sensitive intents like card disputes, account access issues, fraud claims, or PII requests.
- •Routes those cases to secure workflows or human agents.
- •
Audit and observability
- •Logs prompts, retrieved sources, model responses, and escalation decisions.
- •Required for compliance review and incident investigation.
Implementation
1) Install LlamaIndex and load approved banking content
Use only curated documents: product guides, fee schedules, support SOPs, and disclosure pages. Do not index raw customer conversations unless you have a retention policy and redaction pipeline in place.
pip install llama-index llama-index-embeddings-openai llama-index-llms-openai
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
# Load only approved internal docs
documents = SimpleDirectoryReader("./bank_support_docs").load_data()
embed_model = OpenAIEmbedding(model="text-embedding-3-small")
llm = OpenAI(model="gpt-4o-mini", temperature=0)
index = VectorStoreIndex.from_documents(
documents,
embed_model=embed_model,
)
2) Build a query engine that answers with grounded context
For retail banking support, keep temperature at zero and require source-backed responses. That reduces hallucinations when users ask about fees, transfer limits, or card replacement timelines.
query_engine = index.as_query_engine(
llm=llm,
similarity_top_k=3,
response_mode="compact",
)
response = query_engine.query(
"What is the replacement fee for a lost debit card?"
)
print(response)
If you need citations in the output for audit trails, inspect the source nodes:
for node in response.source_nodes:
print(node.node.metadata)
print(node.score)
3) Add a simple intent router for sensitive banking requests
Support agents should not answer everything. If the user asks to lock a card, dispute a transaction, reset credentials, or share account details, route to a secure workflow or authenticated human channel.
SENSITIVE_KEYWORDS = {
"card stolen",
"lost card",
"chargeback",
"dispute",
"password reset",
"account number",
"routing number",
"wire transfer",
}
def classify_intent(message: str) -> str:
text = message.lower()
if any(keyword in text for keyword in SENSITIVE_KEYWORDS):
return "escalate"
return "answer"
def handle_message(message: str):
intent = classify_intent(message)
if intent == "escalate":
return {
"action": "handoff",
"reason": "sensitive_banking_request",
"message": "I’m connecting you to a secure support flow."
}
result = query_engine.query(message)
return {
"action": "answer",
"response": str(result),
"sources": [
node.node.metadata for node in result.source_nodes
]
}
print(handle_message("What is your debit card replacement fee?"))
print(handle_message("My card was stolen yesterday"))
4) Wrap it behind an API boundary with logging
The bank should own the request lifecycle outside the model call. Log input hashes, retrieval metadata, answer IDs, and escalation reasons so compliance can reconstruct what happened later.
import hashlib
import json
from datetime import datetime
def log_event(user_id: str | None, message: str, result: dict):
event = {
"timestamp": datetime.utcnow().isoformat(),
"user_hash": hashlib.sha256((user_id or "").encode()).hexdigest(),
"message_hash": hashlib.sha256(message.encode()).hexdigest(),
"action": result["action"],
"result_meta": result.get("sources", []),
"reason": result.get("reason"),
}
print(json.dumps(event))
result = handle_message("How do I replace my debit card?")
log_event("user_12345", "How do I replace my debit card?", result)
Production Considerations
- •
Data residency
- •Keep embeddings and vector storage in-region if your bank has residency requirements.
- •Verify where the LLM endpoint processes data; some institutions require regional isolation for customer-support workloads.
- •
Compliance and audit
- •Store prompt/response logs with immutable retention controls.
- •Keep source document versions so you can prove which policy text informed an answer on a given date.
- •
Guardrails
- •Block or escalate requests involving authentication changes, fraud claims, disputes, overdrafts tied to hardship cases, and any request containing account identifiers.
- •Add PII redaction before logging. Never write raw PANs, SSNs, or full account numbers into observability tools.
- •
Monitoring
- •Track retrieval hit rate, escalation rate, hallucination reports from agents, and “no answer found” frequency.
- •Alert when the agent starts citing stale policy docs or answering outside its intended scope.
Common Pitfalls
- •
Indexing uncontrolled content
- •Mistake: ingesting all PDFs from shared drives or chat transcripts.
- •Fix: curate an approved corpus with document ownership and review dates.
- •
Letting the model answer sensitive workflows
- •Mistake: allowing free-form responses for disputes, transfers, identity verification, or fraud reporting.
- •Fix: use explicit routing rules before the query engine runs.
- •
Skipping source traceability
- •Mistake: returning answers without storing retrieved nodes or document versions.
- •Fix: persist source metadata with every response so compliance can replay the decision path later.
- •
Using default chatbot settings
- •Mistake: high-temperature generation and broad retrieval over noisy data.
- •Fix: keep generation deterministic where possible and constrain retrieval to topically relevant bank-approved content.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit