Haystack Tutorial (TypeScript): building a RAG pipeline for advanced developers

By Cyprian AaronsUpdated 2026-04-21
haystackbuilding-a-rag-pipeline-for-advanced-developerstypescript

This tutorial builds a production-style Retrieval-Augmented Generation (RAG) pipeline in TypeScript using Haystack. You’ll wire up document ingestion, embedding-based retrieval, prompt construction, and answer generation so you can ship a chat or knowledge assistant that actually cites your internal content.

What You'll Need

  • Node.js 18+ and npm
  • A TypeScript project with tsconfig.json
  • Haystack TypeScript packages:
    • @haystack-ai/core
    • @haystack-ai/openai
    • @haystack-ai/document-store-memory
  • An OpenAI API key exported as OPENAI_API_KEY
  • A corpus of text documents to index
  • Basic familiarity with async/await and ES modules

Step-by-Step

  1. Set up the project and install dependencies. Keep this clean: one package for orchestration, one for the LLM/embeddings provider, and one in-memory store for local development.
npm init -y
npm install @haystack-ai/core @haystack-ai/openai @haystack-ai/document-store-memory
npm install -D typescript tsx @types/node
  1. Create a minimal TypeScript entrypoint and define your sample documents. In real systems, this is where you would load PDFs, HTML, or database rows before chunking them into retrievable units.
import { Pipeline } from "@haystack-ai/core";
import { OpenAIChatGenerator, OpenAITextEmbedder } from "@haystack-ai/openai";
import { InMemoryDocumentStore } from "@haystack-ai/document-store-memory";

const documents = [
  {
    content: "Haystack pipelines let you compose components for retrieval and generation.",
    meta: { source: "docs-1" },
  },
  {
    content: "A good RAG system chunks documents, embeds them, retrieves top-k matches, then answers with context.",
    meta: { source: "docs-2" },
  },
];
  1. Index the documents with embeddings. For advanced use cases, keep indexing separate from query-time execution so you can swap stores later without touching the pipeline logic.
const documentStore = new InMemoryDocumentStore();
const embedder = new OpenAITextEmbedder({
  apiKey: process.env.OPENAI_API_KEY!,
  model: "text-embedding-3-small",
});

async function indexDocuments() {
  const embeddedDocs = [];
  for (const doc of documents) {
    const result = await embedder.run({ text: doc.content });
    embeddedDocs.push({
      content: doc.content,
      embedding: result.embedding,
      meta: doc.meta,
    });
  }

  await documentStore.writeDocuments(embeddedDocs);
}
  1. Build the retrieval-and-generation pipeline. The pattern here is simple: embed the user question, retrieve relevant documents from the store, then pass those snippets into the generator as grounded context.
async function answerQuestion(question: string) {
  const queryEmbedding = await embedder.run({ text: question });

  const retrieved = await documentStore.embeddingRetriever.run({
    queryEmbedding: queryEmbedding.embedding,
    topK: 3,
  });

  const context = retrieved.documents
    .map((doc) => `Source: ${doc.meta?.source}\n${doc.content}`)
    .join("\n\n");

  const generator = new OpenAIChatGenerator({
    apiKey: process.env.OPENAI_API_KEY!,
    model: "gpt-4o-mini",
  });

  const prompt = [
    {
      role: "system",
      content:
        "Answer only using the provided context. If the answer is missing, say you don't know.",
    },
    {
      role: "user",
      content: `Context:\n${context}\n\nQuestion: ${question}`,
    },
  ];

  const response = await generator.run({ messages: prompt });
  return response.reply;
}
  1. Wire it together and run a real query. This gives you an end-to-end path you can later wrap behind an HTTP endpoint or agent tool.
async function main() {
  await indexDocuments();

  const question = "What are the main steps in a RAG system?";
  const answer = await answerQuestion(question);

  console.log("Question:", question);
  console.log("Answer:", answer);
}

main().catch((error) => {
  console.error(error);
  process.exit(1);
});

Testing It

Run the file with npx tsx index.ts after exporting OPENAI_API_KEY. A correct run should print an answer that references the indexed content instead of hallucinating unrelated facts.

Try changing the question to something outside your sample corpus. The model should respond with a grounded fallback like “I don’t know” if your system prompt is doing its job.

If retrieval looks weak, inspect the returned documents before generation and verify your embeddings are being written to the store correctly. In production, also test chunk size, overlap, and top-k values because those three settings usually decide whether your RAG system feels smart or brittle.

Next Steps

  • Replace InMemoryDocumentStore with a persistent vector store for production data.
  • Add chunking and metadata filtering before embedding.
  • Wrap this pipeline in an API route and add tracing so you can inspect retrieval quality per request

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides