How to Build a customer support Agent Using LlamaIndex in TypeScript for banking

By Cyprian AaronsUpdated 2026-04-21

customer-supportllamaindextypescriptbanking

A banking customer support agent answers account and product questions, retrieves policy-backed information, and escalates anything risky or ambiguous to a human. That matters because in banking, bad answers are not just annoying; they create compliance exposure, customer harm, and audit gaps.

Architecture

•
Chat API layer
- •Receives the customer message, session ID, and channel metadata from web, mobile, or contact-center systems.
•
Retriever-backed knowledge base
- •Indexes approved sources like product docs, fee schedules, branch policies, and support runbooks.
•
LLM response engine
- •Uses ContextChatEngine or QueryEngine from LlamaIndex to generate grounded answers from retrieved context.
•
Policy and PII guardrail layer
- •Detects sensitive requests like password resets, card activation, disputes, or account-specific actions and routes them to secure workflows.
•
Audit logging
- •Stores prompt inputs, retrieved chunks, model outputs, and escalation decisions for compliance review.
•
Human handoff path
- •Escalates unresolved or regulated cases to a live agent with conversation state attached.

Implementation

1) Install dependencies and load approved banking content

Use only sanctioned documents. For banking support, that usually means PDFs from internal policy teams, FAQ pages approved by compliance, and product terms.

npm install llamaindex zod dotenv

import "dotenv/config";
import { SimpleDirectoryReader } from "llamaindex";

async function loadDocs() {
  const reader = new SimpleDirectoryReader();
  const docs = await reader.loadData({ directoryPath: "./bank-docs" });
  console.log(`Loaded ${docs.length} documents`);
  return docs;
}

loadDocs().catch(console.error);

Keep the source set small and governed. If compliance has not signed off on a document, it does not belong in the index.

2) Build the index and query engine

For a support agent, start with retrieval over approved content before you add more complex orchestration. In LlamaIndex TypeScript, VectorStoreIndex.fromDocuments() is the core pattern.

import "dotenv/config";
import {
  VectorStoreIndex,
  Settings,
  OpenAI,
  SimpleDirectoryReader,
} from "llamaindex";

Settings.llm = new OpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

async function main() {
  const reader = new SimpleDirectoryReader();
  const docs = await reader.loadData({ directoryPath: "./bank-docs" });

  const index = await VectorStoreIndex.fromDocuments(docs);

  const queryEngine = index.asQueryEngine({
    similarityTopK: 4,
  });

  const response = await queryEngine.query({
    query: "What is the fee for an international wire transfer?",
  });

  console.log(response.response);
}

main().catch(console.error);

This gives you grounded answers from your bank’s own content. The important part is not the model; it is the retrieval boundary around approved knowledge.

3) Add chat memory for multi-turn support

Support conversations are not single-shot queries. A customer asks about overdraft fees, then follows up with “what about my student account?” so you need conversational state.

import "dotenv/config";
import {
  ContextChatEngine,
  VectorStoreIndex,
  Settings,
  OpenAI,
  SimpleDirectoryReader,
} from "llamaindex";

Settings.llm = new OpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

async function buildChatAgent() {
  const docs = await new SimpleDirectoryReader().loadData({
    directoryPath: "./bank-docs",
  });

  const index = await VectorStoreIndex.fromDocuments(docs);

  return ContextChatEngine.fromDefaults({
    retriever: index.asRetriever({ similarityTopK: 4 }),
    systemPrompt:
      "You are a banking support assistant. Answer only from provided context. If the user asks for account-specific actions or sensitive operations, escalate to a human agent.",
    chatHistory: [],
    memoryBufferSize: 10,
    verbose: false,
  });
}

async function run() {
  const agent = await buildChatAgent();

  const first = await agent.chat({
    message: "How much is the monthly maintenance fee?",
    chatHistory: [],
  });

  console.log(first.response);

  const second = await agent.chat({
    message: "Does that apply to premium accounts too?",
    chatHistory: first.sources ? [] : [],
  });

  console.log(second.response);
}

run().catch(console.error);

In production you will persist chat history outside the process. Keep conversation IDs in your app database so you can reconstruct context during audits or handoffs.

4) Put guardrails around regulated requests

Do not let the agent perform account actions directly unless you have explicit authz flows. Use deterministic routing for high-risk intents like disputes, fraud claims, address changes, card blocking, or password resets.

A simple pattern is intent classification before retrieval:

type RiskLevel = "low" | "medium" | "high";

function classifyRisk(message: string): RiskLevel {
	const text = message.toLowerCase();

	if (
		text.includes("password") ||
		text.includes("card block") ||
		text.includes("dispute") ||
		text.includes("fraud") ||
		text.includes("account number")
	) {
		return "high";
	}

	if (text.includes("fee") || text.includes("interest") || text.includes("limit")) {
		return "medium";
	}

	return "low";
}

Route high risk cases to a human queue and log the decision. That is where compliance teams will look first when something goes wrong.

Production Considerations

•
Deploy in-region
- •Keep embeddings, vector store, logs, and model traffic inside your approved data residency region.
•
Log retrieval traces
- •Store query text, top-k nodes returned by VectorStoreIndex, final answer text, escalation reason, and operator handoff ID.
•
Add policy-based redaction
- •Strip PANs, account numbers, SSNs, addresses, and authentication tokens before prompts ever reach the model.
•
Use explicit approval boundaries
- •Only ingest content signed off by legal/compliance; treat policy updates as controlled releases with versioning.

Common Pitfalls

•
Indexing raw operational data
- •Don’t dump CRM exports or transaction tables into the vector store.
- •Avoid this by indexing only curated support content and keeping customer-specific data behind authenticated backend APIs.
•
Letting the model answer beyond source material
- •Banking agents should not invent fee rules or policy exceptions.
- •Avoid this by using retrieval-first prompts that instruct the model to answer only from context and escalate when context is missing.
•
Ignoring auditability
- •If you cannot explain why an answer was given, compliance will reject it.
- •Avoid this by storing source chunks, model version, prompt version, and escalation decisions for every interaction.
•
Treating all intents as equal
- •A balance inquiry is not the same as a fraud report.
- •Avoid this by classifying risk up front and sending regulated workflows to secure systems or humans immediately.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit