How to Build a customer support Agent Using LlamaIndex in TypeScript for healthcare

By Cyprian AaronsUpdated 2026-04-21
customer-supportllamaindextypescripthealthcare

A healthcare customer support agent answers patient questions, routes requests, and pulls the right policy or care instructions from approved sources. The important part is not just speed; it’s reducing call center load while keeping responses compliant, auditable, and grounded in the organization’s own documentation.

Architecture

  • Ingestion layer

    • Loads approved documents like benefit summaries, appointment policies, billing FAQs, and triage disclaimers.
    • Use SimpleDirectoryReader for local files or your own loaders for CMS/EMR-exported content.
  • Index

    • A VectorStoreIndex over curated healthcare content.
    • Keep the corpus narrow. Support agents should answer from policy docs, not improvise from general medical knowledge.
  • Retriever

    • Retrieves only the most relevant chunks for a patient question.
    • Tune similarityTopK to reduce noisy context and hallucinations.
  • Chat engine / query engine

    • Wraps retrieval plus response generation into a support workflow.
    • Use a system prompt that enforces boundaries: no diagnosis, no treatment advice, escalate urgent symptoms.
  • Guardrails layer

    • Detects PHI exposure, emergency language, and unsupported medical requests.
    • Routes high-risk cases to a human agent or emergency guidance.
  • Audit/logging layer

    • Stores query metadata, document IDs used in the answer, timestamps, and escalation decisions.
    • This matters for compliance review and incident investigation.

Implementation

1) Install and configure the TypeScript project

Use the LlamaIndex TypeScript SDK and an LLM provider that supports your deployment constraints. In healthcare, you usually want region control, private networking, and contractual assurances around data handling.

npm install llamaindex dotenv

Set environment variables for your model provider and any storage you use:

OPENAI_API_KEY=...

2) Load approved healthcare support content and build an index

Keep your source set small and controlled. For example: billing FAQs, appointment policies, telehealth instructions, privacy notices, and escalation rules.

import "dotenv/config";
import {
  Document,
  Settings,
  SimpleDirectoryReader,
  VectorStoreIndex,
} from "llamaindex";

async function main() {
  // Configure global model settings
  // Replace with your provider/model choice as needed
  Settings.chunkSize = 512;
  Settings.chunkOverlap = 50;

  const reader = new SimpleDirectoryReader();
  const docs = await reader.loadData({
    directoryPath: "./healthcare-support-content",
    recursive: true,
  });

  const sanitizedDocs = docs.map(
    (d) =>
      new Document({
        text: d.text,
        metadata: {
          ...d.metadata,
          sourceSystem: "approved-support-content",
          complianceTag: "healthcare-support",
        },
      }),
  );

  const index = await VectorStoreIndex.fromDocuments(sanitizedDocs);
  const queryEngine = index.asQueryEngine({
    similarityTopK: 3,
    responseMode: "compact",
    preFilters: {
      filters: [
        {
          key: "complianceTag",
          value: "healthcare-support",
          operator: "==",
        },
      ],
    },
  });

  const response = await queryEngine.query({
    query: "What is your policy for rescheduling a telehealth appointment?",
  });

  console.log(response.toString());
}

main().catch(console.error);

This pattern gives you a controlled retrieval surface. The preFilters step is useful when you store multiple document classes in one index but only want support-approved material in production answers.

3) Add a chat-style support loop with escalation rules

A customer support agent should not just answer questions. It should classify intent first: billing issue, appointment change, portal access problem, privacy request, or urgent medical concern.

import { OpenAI } from "llamaindex";
import { VectorStoreIndex } from "llamaindex";

const llm = new OpenAI({
  model: "gpt-4o-mini",
});

async function answerSupportQuestion(question: string) {
  if (/chest pain|shortness of breath|suicidal|stroke/i.test(question)) {
    return {
      type: "escalation",
      message:
        "This may be urgent. Please contact emergency services or your local emergency number now.",
    };
  }

  const index = await VectorStoreIndex.fromDocuments([]); // replace with persisted index loading in real deployments
  const queryEngine = index.asQueryEngine({
    similarityTopK: 3,
    llm,
    responseMode: "compact",
    systemPrompt:
      "You are a healthcare customer support agent. Answer only from retrieved policy content. Do not diagnose or provide medical advice. If the question requires clinical judgment or mentions urgent symptoms, escalate.",
  });

  const response = await queryEngine.query({ query: question });

  return {
    type: "support_answer",
    message: response.toString(),
    sources:
      response.sourceNodes?.map((node) => ({
        id: node.node.id_,
        score: node.score,
        textPreview: node.node.getText().slice(0, 120),
      })) ?? [],
    auditTag: "healthcare-support-response",
  };
}

The key pattern here is routing before retrieval for obvious emergencies. In healthcare support flows, this prevents the model from trying to “help” when it should escalate immediately.

4) Persist indices and log every answer for auditability

In production you should not rebuild the index on every request. Persist it in secure storage, load it at startup, and log which nodes were used in each answer.

ConcernWhat to do
ComplianceRestrict content to approved policy documents
AuditabilityStore question text hash, retrieved node IDs, timestamps
Data residencyKeep vector store + logs in-region
PHI handlingRedact before logging; never store raw sensitive chat unless required

Production Considerations

  • Deploy inside your regulated boundary

    • Run the service in a private VPC with region-locked storage.
    • Keep embeddings and logs in the same jurisdiction as required by policy.
  • Monitor retrieval quality

    • Track top-k hit rate, fallback/escalation rate, and answer citation coverage.
    • If answers frequently cite irrelevant chunks, tighten chunking or reduce similarityTopK.
  • Add PHI guardrails

    • Redact names, MRNs, phone numbers, insurance IDs before writing logs.
    • Block prompts that ask for diagnosis or treatment recommendations unless your clinical governance team explicitly approves that flow.
  • Create human handoff paths

    • Route billing disputes with missing context to a live agent.
    • Route symptom-related questions to clinical triage workflows instead of generic support replies.

Common Pitfalls

  1. Using broad public web content as knowledge base

    • Don’t mix general medical articles with patient support policies.
    • Fix it by indexing only approved internal documents with clear metadata filters.
  2. Logging raw chat transcripts without redaction

    • That creates avoidable PHI exposure in observability tools.
    • Fix it by hashing identifiers and stripping sensitive entities before persistence.
  3. Letting the model answer clinical questions

    • A support agent is not a clinician.
    • Fix it by hard-coding escalation rules for symptom keywords and using prompts that forbid diagnosis or treatment advice.
  4. Ignoring data residency requirements

    • Embeddings and vector stores can still be regulated data.
    • Fix it by choosing regional infrastructure and verifying where every component stores data at rest.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides