How to Build a customer support Agent Using LlamaIndex in Python for investment banking

By Cyprian AaronsUpdated 2026-04-21

customer-supportllamaindexpythoninvestment-banking

A customer support agent for investment banking answers client questions against approved internal knowledge: product docs, fee schedules, onboarding steps, trading hours, escalation paths, and policy snippets. It matters because the bank needs fast responses without leaking sensitive data, inventing policy, or violating compliance rules.

Architecture

•
Document ingestion layer
- •Pulls from approved sources only: PDFs, SharePoint exports, policy docs, FAQs, and client-facing playbooks.
- •Normalizes content before indexing so the agent does not answer from stale or duplicated text.
•
Vector index
- •Stores embeddings for semantic retrieval using VectorStoreIndex.
- •Keeps support answers grounded in the bank’s curated knowledge base.
•
Retriever
- •Uses index.as_retriever() to fetch top-k relevant chunks.
- •Should be tuned for precision over recall because hallucinated support answers are expensive in regulated environments.
•
Response synthesizer / query engine
- •Uses index.as_query_engine() to generate answers from retrieved context.
- •Must be constrained to cite sources and refuse when evidence is weak.
•
Guardrails layer
- •Applies policy checks for PII, MNPI, KYC/AML-related requests, and restricted advice.
- •Routes risky questions to human support or compliance.
•
Audit and observability
- •Logs prompts, retrieved nodes, response text, and user identity.
- •Required for incident review, model governance, and regulatory traceability.

Implementation

1) Install dependencies and load approved documents

Use LlamaIndex plus a simple file reader. In production, replace local files with your document store connector and enforce source allowlists.

pip install llama-index llama-index-readers-file

from pathlib import Path

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter

DOCS_DIR = Path("./bank_support_docs")

documents = SimpleDirectoryReader(
    input_dir=str(DOCS_DIR),
    required_exts=[".pdf", ".txt", ".md"],
).load_data()

splitter = SentenceSplitter(chunk_size=512, chunk_overlap=64)
nodes = splitter.get_nodes_from_documents(documents)

print(f"Loaded {len(documents)} documents into {len(nodes)} nodes")

2) Build the index and query engine

For a support agent, keep retrieval tight. Use a small similarity_top_k first; widen only if you can tolerate more noise.

from llama_index.core import Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)

index = VectorStoreIndex(nodes)

query_engine = index.as_query_engine(
    similarity_top_k=3,
    response_mode="compact",
)

3) Add a guardrail wrapper for banking-specific requests

This is where you stop the agent from answering restricted questions. Investment banking support often sits next to sensitive workflows, so you need explicit refusals for MNPI, trade advice, account access changes, or anything requiring identity verification.

import re

RESTRICTED_PATTERNS = [
    r"\bMNPI\b",
    r"\binsider\b",
    r"\btrade recommendation\b",
    r"\bchange my beneficiary\b",
    r"\bwire transfer\b",
    r"\baccount number\b",
]

def is_restricted(query: str) -> bool:
    q = query.lower()
    return any(re.search(pattern.lower(), q) for pattern in RESTRICTED_PATTERNS)

def answer_support_query(query: str) -> str:
    if is_restricted(query):
        return (
            "I can’t help with that request. "
            "Please route it through the appropriate authenticated banking workflow "
            "or escalate to compliance/support."
        )

    response = query_engine.query(
        f"Answer using only the provided internal support docs. "
        f"If the answer is not in the docs, say you do not have enough information.\n\n"
        f"Question: {query}"
    )
    return str(response)

4) Run the agent with citation-friendly responses

The simplest production pattern is a thin service layer around answer_support_query(). Keep it deterministic and log every request-response pair with metadata.

def main():
    queries = [
        "What are the cut-off times for same-day USD wires?",
        "How do I change my beneficiary on an institutional account?",
        "Where is the onboarding checklist for new prime brokerage clients?",
    ]

    for q in queries:
        print("=" * 80)
        print("Q:", q)
        print("A:", answer_support_query(q))

if __name__ == "__main__":
    main()

If you want stronger grounding, switch to RetrieverQueryEngine patterns with explicit node inspection before generation. In regulated support flows, being able to show which chunks were used is more important than squeezing out a prettier answer.

Production Considerations

•
Deployment
- •Run the service inside your bank’s approved environment with network egress locked down.
- •Keep embeddings and LLM calls in-region if your data residency policy requires it.
- •Do not send client-specific or confidential content to unmanaged third-party endpoints.
•
Monitoring
- •Log query text, top-k retrieved node IDs, final answer, latency, and refusal reason.
- •Track deflection rate versus escalation rate.
- •Sample conversations for compliance review and regression testing after every prompt or model change.
•
Guardrails
- •Block requests involving trading advice, account changes, KYC/AML decisions, or MNPI.
- •Require authentication context before answering anything account-specific.
- •Add a “human handoff” path when retrieval confidence is low or the user asks about restricted topics.
•
Auditability
- •Store immutable traces of what content was retrieved and which model version answered.
- •Make sure support staff can reconstruct why an answer was given during an incident review.

Common Pitfalls

•
Indexing uncontrolled content
- •Problem: teams dump email threads or working notes into the index.
- •Fix: index only approved client-facing or internal-policy documents with an owner and review date.
•
Letting the model answer beyond evidence
- •Problem: the agent sounds confident even when retrieval is weak.
- •Fix: force refusal behavior when no relevant nodes are found and keep temperature=0 for support use cases.
•
Ignoring compliance boundaries
- •Problem: users ask for trade recommendations or account changes and the bot tries to be helpful.
- •Fix: add explicit regex/policy checks plus authenticated workflow routing before calling the LLM.
•
Skipping audit logs
- •Problem: no record of what was asked or which sources were used.
- •Fix: log prompt metadata, retrieved nodes, response text, user identity claims, and model version in an immutable store.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit