LlamaIndex Tutorial (TypeScript): building a RAG pipeline for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexbuilding-a-rag-pipeline-for-intermediate-developerstypescript

This tutorial builds a production-shaped Retrieval-Augmented Generation pipeline in TypeScript using LlamaIndex. You’ll ingest documents, index them, retrieve relevant context, and answer questions from that context instead of relying on the model’s memory.

What You'll Need

  • Node.js 18+
  • A TypeScript project with typescript and tsx or ts-node
  • llamaindex installed
  • An OpenAI API key set as OPENAI_API_KEY
  • A small document set to query, such as .txt, .md, or .pdf files
  • Basic familiarity with async/await and ES modules

Install the package:

npm install llamaindex
npm install -D typescript tsx @types/node

Step-by-Step

  1. Set up a minimal TypeScript entry point and load your documents from disk. For a real RAG pipeline, keep ingestion separate from querying so you can rebuild indexes without touching application code.
import { SimpleDirectoryReader } from "llamaindex";

async function main() {
  const reader = new SimpleDirectoryReader();
  const documents = await reader.loadData("./data");
  console.log(`Loaded ${documents.length} documents`);
}

main().catch(console.error);
  1. Build a vector index from those documents. This is the core retrieval layer: it chunks content, embeds it, and stores vectors so later queries can pull back the most relevant passages.
import { SimpleDirectoryReader, VectorStoreIndex } from "llamaindex";

async function main() {
  const reader = new SimpleDirectoryReader();
  const documents = await reader.loadData("./data");

  const index = await VectorStoreIndex.fromDocuments(documents);
  console.log("Index built successfully");

  return index;
}

main().catch(console.error);
  1. Create a query engine from the index and ask a question. The query engine handles retrieval plus synthesis, which is what turns raw document chunks into an answer.
import { SimpleDirectoryReader, VectorStoreIndex } from "llamaindex";

async function main() {
  const reader = new SimpleDirectoryReader();
  const documents = await reader.loadData("./data");
  const index = await VectorStoreIndex.fromDocuments(documents);

  const queryEngine = index.asQueryEngine();
  const response = await queryEngine.query({
    query: "What does this document set say about refund timelines?",
  });

  console.log(response.toString());
}

main().catch(console.error);
  1. Add source-aware responses so you can inspect what the model used. In banking and insurance workflows, this matters because you need traceability for every answer.
import { SimpleDirectoryReader, VectorStoreIndex } from "llamaindex";

async function main() {
  const reader = new SimpleDirectoryReader();
  const documents = await reader.loadData("./data");
  const index = await VectorStoreIndex.fromDocuments(documents);

  const queryEngine = index.asQueryEngine();
  const response = await queryEngine.query({
    query: "Summarize the claims escalation policy.",
  });

  console.log("Answer:");
  console.log(response.toString());

  console.log("\nSources:");
  for (const source of response.sourceNodes ?? []) {
    console.log(`- ${source.node.getContent().slice(0, 200)}...`);
  }
}

main().catch(console.error);
  1. Wrap the pipeline in a reusable script so you can run ingestion and querying as separate commands. This is the shape you want in real projects: deterministic indexing, then stateless querying.
import { SimpleDirectoryReader, VectorStoreIndex } from "llamaindex";

async function buildIndex() {
  const reader = new SimpleDirectoryReader();
  const documents = await reader.loadData("./data");
  return VectorStoreIndex.fromDocuments(documents);
}

async function askQuestion(question: string) {
  const index = await buildIndex();
  const queryEngine = index.asQueryEngine();

  const response = await queryEngine.query({ query: question });
  console.log(response.toString());
}

askQuestion("What are the policy exceptions?").catch(console.error);

Testing It

Run the script against a small folder of text files first. Use documents with distinct topics so you can see whether retrieval is actually selecting the right chunks instead of hallucinating from generic context.

If the answer is vague, check whether your source documents are too large or too noisy. In practice, RAG quality depends heavily on clean inputs and well-scoped questions.

Try asking for something that appears in only one file, then verify the output against that file directly. If response.sourceNodes points to the right content, your retrieval layer is working.

For debugging, print the retrieved nodes before synthesis and inspect whether chunking is sensible. If retrieval is weak, fix your data layout before tuning prompts.

Next Steps

  • Add persistent storage with a real vector database like Pinecone or Qdrant
  • Tune chunk size and overlap for your document type
  • Add metadata filters for tenant, product line, or policy version

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides