How to Build a customer support Agent Using LlamaIndex in Python for investment banking
A customer support agent for investment banking answers client questions against approved internal knowledge: product docs, fee schedules, onboarding steps, trading hours, escalation paths, and policy snippets. It matters because the bank needs fast responses without leaking sensitive data, inventing policy, or violating compliance rules.
Architecture
- •
Document ingestion layer
- •Pulls from approved sources only: PDFs, SharePoint exports, policy docs, FAQs, and client-facing playbooks.
- •Normalizes content before indexing so the agent does not answer from stale or duplicated text.
- •
Vector index
- •Stores embeddings for semantic retrieval using
VectorStoreIndex. - •Keeps support answers grounded in the bank’s curated knowledge base.
- •Stores embeddings for semantic retrieval using
- •
Retriever
- •Uses
index.as_retriever()to fetch top-k relevant chunks. - •Should be tuned for precision over recall because hallucinated support answers are expensive in regulated environments.
- •Uses
- •
Response synthesizer / query engine
- •Uses
index.as_query_engine()to generate answers from retrieved context. - •Must be constrained to cite sources and refuse when evidence is weak.
- •Uses
- •
Guardrails layer
- •Applies policy checks for PII, MNPI, KYC/AML-related requests, and restricted advice.
- •Routes risky questions to human support or compliance.
- •
Audit and observability
- •Logs prompts, retrieved nodes, response text, and user identity.
- •Required for incident review, model governance, and regulatory traceability.
Implementation
1) Install dependencies and load approved documents
Use LlamaIndex plus a simple file reader. In production, replace local files with your document store connector and enforce source allowlists.
pip install llama-index llama-index-readers-file
from pathlib import Path
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
DOCS_DIR = Path("./bank_support_docs")
documents = SimpleDirectoryReader(
input_dir=str(DOCS_DIR),
required_exts=[".pdf", ".txt", ".md"],
).load_data()
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=64)
nodes = splitter.get_nodes_from_documents(documents)
print(f"Loaded {len(documents)} documents into {len(nodes)} nodes")
2) Build the index and query engine
For a support agent, keep retrieval tight. Use a small similarity_top_k first; widen only if you can tolerate more noise.
from llama_index.core import Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
index = VectorStoreIndex(nodes)
query_engine = index.as_query_engine(
similarity_top_k=3,
response_mode="compact",
)
3) Add a guardrail wrapper for banking-specific requests
This is where you stop the agent from answering restricted questions. Investment banking support often sits next to sensitive workflows, so you need explicit refusals for MNPI, trade advice, account access changes, or anything requiring identity verification.
import re
RESTRICTED_PATTERNS = [
r"\bMNPI\b",
r"\binsider\b",
r"\btrade recommendation\b",
r"\bchange my beneficiary\b",
r"\bwire transfer\b",
r"\baccount number\b",
]
def is_restricted(query: str) -> bool:
q = query.lower()
return any(re.search(pattern.lower(), q) for pattern in RESTRICTED_PATTERNS)
def answer_support_query(query: str) -> str:
if is_restricted(query):
return (
"I can’t help with that request. "
"Please route it through the appropriate authenticated banking workflow "
"or escalate to compliance/support."
)
response = query_engine.query(
f"Answer using only the provided internal support docs. "
f"If the answer is not in the docs, say you do not have enough information.\n\n"
f"Question: {query}"
)
return str(response)
4) Run the agent with citation-friendly responses
The simplest production pattern is a thin service layer around answer_support_query(). Keep it deterministic and log every request-response pair with metadata.
def main():
queries = [
"What are the cut-off times for same-day USD wires?",
"How do I change my beneficiary on an institutional account?",
"Where is the onboarding checklist for new prime brokerage clients?",
]
for q in queries:
print("=" * 80)
print("Q:", q)
print("A:", answer_support_query(q))
if __name__ == "__main__":
main()
If you want stronger grounding, switch to RetrieverQueryEngine patterns with explicit node inspection before generation. In regulated support flows, being able to show which chunks were used is more important than squeezing out a prettier answer.
Production Considerations
- •
Deployment
- •Run the service inside your bank’s approved environment with network egress locked down.
- •Keep embeddings and LLM calls in-region if your data residency policy requires it.
- •Do not send client-specific or confidential content to unmanaged third-party endpoints.
- •
Monitoring
- •Log query text, top-k retrieved node IDs, final answer, latency, and refusal reason.
- •Track deflection rate versus escalation rate.
- •Sample conversations for compliance review and regression testing after every prompt or model change.
- •
Guardrails
- •Block requests involving trading advice, account changes, KYC/AML decisions, or MNPI.
- •Require authentication context before answering anything account-specific.
- •Add a “human handoff” path when retrieval confidence is low or the user asks about restricted topics.
- •
Auditability
- •Store immutable traces of what content was retrieved and which model version answered.
- •Make sure support staff can reconstruct why an answer was given during an incident review.
Common Pitfalls
- •
Indexing uncontrolled content
- •Problem: teams dump email threads or working notes into the index.
- •Fix: index only approved client-facing or internal-policy documents with an owner and review date.
- •
Letting the model answer beyond evidence
- •Problem: the agent sounds confident even when retrieval is weak.
- •Fix: force refusal behavior when no relevant nodes are found and keep
temperature=0for support use cases.
- •
Ignoring compliance boundaries
- •Problem: users ask for trade recommendations or account changes and the bot tries to be helpful.
- •Fix: add explicit regex/policy checks plus authenticated workflow routing before calling the LLM.
- •
Skipping audit logs
- •Problem: no record of what was asked or which sources were used.
- •Fix: log prompt metadata, retrieved nodes, response text, user identity claims, and model version in an immutable store.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit