Haystack Tutorial (TypeScript): building a RAG pipeline for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
haystackbuilding-a-rag-pipeline-for-intermediate-developerstypescript

This tutorial builds a working retrieval-augmented generation (RAG) pipeline in TypeScript using Haystack. You’ll wire together document loading, embedding, retrieval, and answer generation so you can answer questions from your own content instead of relying on model memory.

What You'll Need

  • Node.js 18+
  • A TypeScript project with ts-node or a build step
  • @haystack/core
  • @haystack/openai
  • An OpenAI API key in OPENAI_API_KEY
  • A small local corpus to index, such as markdown, text, or policy docs
  • Basic familiarity with async/await and TypeScript classes

Step-by-Step

  1. Start by installing the packages and setting up environment variables. I’m using OpenAI for both embeddings and generation because it keeps the example compact and production-friendly.
npm install @haystack/core @haystack/openai dotenv
npm install -D typescript ts-node @types/node
import "dotenv/config";

if (!process.env.OPENAI_API_KEY) {
  throw new Error("OPENAI_API_KEY is required");
}
  1. Create a small document set and split it into chunks. RAG works better when you retrieve smaller, semantically focused passages instead of whole files.
import { Document } from "@haystack/core";

const docs = [
  new Document({
    content: "Haystack pipelines let you connect components for retrieval and generation.",
    meta: { source: "intro.md" },
  }),
  new Document({
    content: "For RAG, chunking improves recall because embeddings work better on focused text.",
    meta: { source: "rag.md" },
  }),
  new Document({
    content: "TypeScript users can compose Haystack components with strongly typed inputs and outputs.",
    meta: { source: "ts.md" },
  }),
];

const chunks = docs.flatMap((doc) => {
  const text = doc.content as string;
  return text.match(/.{1,80}(?:\s|$)/g)?.map((part) => new Document({ content: part.trim(), meta: doc.meta })) ?? [];
});
  1. Build an in-memory document store and index embeddings into it. For an intermediate tutorial, this keeps the architecture clear without hiding the retrieval layer behind a vector database.
import {
  InMemoryDocumentStore,
  InMemoryEmbeddingRetriever,
} from "@haystack/core";
import { OpenAITextEmbedder } from "@haystack/openai";

const documentStore = new InMemoryDocumentStore();
const embedder = new OpenAITextEmbedder({
  model: "text-embedding-3-small",
});

for (const chunk of chunks) {
  const result = await embedder.run({ text: chunk.content as string });
  await documentStore.writeDocuments([
    new Document({
      content: chunk.content,
      embedding: result.embedding,
      meta: chunk.meta,
    }),
  ]);
}

const retriever = new InMemoryEmbeddingRetriever(documentStore);
  1. Add the generator that turns retrieved context into an answer. The important bit is that the prompt explicitly tells the model to use only the provided context; that reduces hallucinations when your corpus is narrow.
import { OpenAIChatGenerator } from "@haystack/openai";

const generator = new OpenAIChatGenerator({
  model: "gpt-4o-mini",
});

async function answerQuestion(question: string) {
  const queryEmbedding = await embedder.run({ text: question });
  const retrieved = await retriever.run({
    queryEmbedding: queryEmbedding.embedding,
    topK: 3,
  });

  const context = retrieved.documents
    .map((doc, i) => `[${i + 1}] ${doc.content}`)
    .join("\n");

  const prompt = `
Answer the question using only the context below.

Context:
${context}

Question:
${question}
`;

  return generator.run({
    messages: [{ role: "user", content: prompt }],
  });
}
  1. Run the pipeline end to end with a real query. This is where you confirm your retrieval quality before you ever think about adding persistence or a web API.
const result = await answerQuestion("Why does chunking matter in RAG?");
console.log(JSON.stringify(result, null, 2));

Testing It

Run the script with npx ts-node your-file.ts and make sure it returns an answer grounded in your sample documents. If it answers something generic like a model summary of RAG but ignores your context, inspect the retrieved chunks first; that usually means bad chunking or weak embeddings.

Try at least three queries:

  • one directly answered by the corpus
  • one partially answered by the corpus
  • one outside the corpus

You want to see short, specific answers for the first two and an honest “not enough context” style response for the third. If you get hallucinations on out-of-scope questions, tighten your system instructions and reduce temperature on the generator.

Next Steps

  • Replace InMemoryDocumentStore with a persistent vector store for real workloads.
  • Add metadata filters so you can scope retrieval by tenant, policy type, or document version.
  • Introduce reranking before generation if your corpus gets large or noisy.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides