LlamaIndex Tutorial (TypeScript): building a RAG pipeline for advanced developers

By Cyprian AaronsUpdated 2026-04-21

llamaindexbuilding-a-rag-pipeline-for-advanced-developerstypescript

This tutorial builds a production-style retrieval-augmented generation pipeline in TypeScript using LlamaIndex. You’ll ingest documents, chunk them, index them, and query with a retriever + response synthesizer pattern that is easier to tune than a single black-box queryEngine.

What You'll Need

•Node.js 18+ and npm
•A TypeScript project initialized with tsconfig.json
•@llamaindex/core
•@llamaindex/openai
•An OpenAI API key in OPENAI_API_KEY
•A small document set to index, stored locally as .txt files
•Basic familiarity with async/await and ES modules

Step-by-Step

•Start with a clean TypeScript project and install the dependencies. Keep this minimal; the goal is to make the pipeline easy to run and easy to extend later.

mkdir llamaindex-rag-ts
cd llamaindex-rag-ts
npm init -y
npm install @llamaindex/core @llamaindex/openai dotenv
npm install -D typescript tsx @types/node
npx tsc --init

•Create a small document loader that reads local files into LlamaIndex documents. For advanced RAG work, keep ingestion deterministic so you can debug retrieval quality before adding vector databases or hybrid search.

// src/loadDocs.ts
import { promises as fs } from "node:fs";
import path from "node:path";
import { Document } from "@llamaindex/core/schema";

export async function loadDocuments(dir: string): Promise<Document[]> {
  const files = await fs.readdir(dir);
  const txtFiles = files.filter((f) => f.endsWith(".txt"));

  const docs: Document[] = [];
  for (const file of txtFiles) {
    const fullPath = path.join(dir, file);
    const text = await fs.readFile(fullPath, "utf8");
    docs.push(new Document({ text, metadata: { source: file } }));
  }
  return docs;
}

•Build the index with explicit chunking and an OpenAI-backed embedding model. This is where most RAG systems either get too vague or too complex; here we keep the knobs visible so you can tune chunk size and overlap based on your corpus.

// src/index.ts
import "dotenv/config";
import { Settings, VectorStoreIndex } from "@llamaindex/core";
import { OpenAIEmbedding, OpenAI } from "@llamaindex/openai";
import { SentenceSplitter } from "@llamaindex/core/node-parser";
import { loadDocuments } from "./loadDocs";

async function main() {
  Settings.embedModel = new OpenAIEmbedding({
    model: "text-embedding-3-small",
  });
  Settings.llm = new OpenAI({
    model: "gpt-4o-mini",
    temperature: 0,
  });
  Settings.nodeParser = new SentenceSplitter({
    chunkSize: 512,
    chunkOverlap: 80,
  });

  const docs = await loadDocuments("./data");
  const index = await VectorStoreIndex.fromDocuments(docs);

  console.log(`Indexed ${docs.length} documents`);
}

main().catch(console.error);

•Add a query path that retrieves top-k chunks and uses them to answer questions. For advanced developers, this is the point where you start measuring whether retrieval is actually helping the model instead of just adding latency.

// src/query.ts
import "dotenv/config";
import {
  Settings,
  VectorStoreIndex,
} from "@llamaindex/core";
import { OpenAIEmbedding, OpenAI } from "@llamaindex/openai";
import { SentenceSplitter } from "@llamaindex/core/node-parser";
import { loadDocuments } from "./loadDocs";

async function main() {
  Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-small" });
  Settings.llm = new OpenAI({ model: "gpt-4o-mini", temperature: 0 });
  Settings.nodeParser = new SentenceSplitter({ chunkSize: 512, chunkOverlap: 80 });

  const docs = await loadDocuments("./data");
  const index = await VectorStoreIndex.fromDocuments(docs);

  const retriever = index.asRetriever({ similarityTopK: 3 });
  const queryEngine = index.asQueryEngine({ retriever });

  const response = await queryEngine.query({
    query: "What are the main operational risks described in these documents?",
  });

  console.log(response.toString());
}

main().catch(console.error);

•If you want more control over generation, separate retrieval from synthesis and inspect retrieved nodes before answering. This is the pattern you use when debugging bad answers, because it tells you whether the failure came from retrieval or prompting.

// src/debug-query.ts
import "dotenv/config";
import {
  Settings,
  VectorStoreIndex,
} from "@llamaindex/core";
import { OpenAIEmbedding, OpenAI } from "@llamaindex/openai";
import { SentenceSplitter } from "@llamaindex/core/node-parser";
import { loadDocuments } from "./loadDocs";

async function main() {
  Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-small" });
  Settings.llm = new OpenAI({ model: "gpt-4o-mini", temperature: 0 });
  Settings.nodeParser = new SentenceSplitter({ chunkSize: 512, chunkOverlap: 80 });

  const docs = await loadDocuments("./data");
   const index = await VectorStoreIndex.fromDocuments(docs);

   const retriever = index.asRetriever({ similarityTopK: 5 });
   const nodes = await retriever.retrieve("Summarize compliance gaps.");

   for (const node of nodes) {
     console.log("SCORE:", node.score);
     console.log("TEXT:", node.node.getContent().slice(0, 200));
     console.log("SOURCE:", node.node.metadata?.source);
     console.log("---");
   }
}

main().catch(console.error);

Testing It

Put two or three .txt files in a ./data directory with content that contains distinct topics, then run the scripts with npx tsx src/query.ts. If the answer quotes or paraphrases content that only exists in one file, retrieval is working.

Next, run npx tsx src/debug-query.ts and check whether the top chunks are semantically relevant to your question. If they are not, adjust chunkSize, chunkOverlap, or similarityTopK before touching prompts.

For a stronger test, ask a question whose answer spans multiple documents. A good RAG pipeline should pull multiple relevant chunks and produce an answer grounded in those sources instead of hallucinating a generic summary.

Next Steps

•Add a persistent vector store like Pinecone, Qdrant, or Postgres so indexing survives process restarts.
•Add metadata filters for source system, document type, or policy version.
•Implement reranking so your top-k retrieval gets cleaner before synthesis.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit