Haystack Tutorial (TypeScript): building a RAG pipeline for beginners

By Cyprian AaronsUpdated 2026-04-21

haystackbuilding-a-rag-pipeline-for-beginnerstypescript

This tutorial builds a basic Retrieval-Augmented Generation (RAG) pipeline in TypeScript with Haystack. By the end, you’ll have a working flow that takes a question, retrieves relevant documents from a vector store, and sends that context to an LLM for an answer.

What You'll Need

•Node.js 18+ and npm
•A TypeScript project with ts-node or a build step
•
Haystack TypeScript packages:
- •@haystack/core
- •@haystack/experimental
•An OpenAI API key for embeddings and generation
•A Qdrant instance running locally or remotely
•Sample text documents to index

Install the packages:

npm install @haystack/core @haystack/experimental openai @qdrant/js-client-rest
npm install -D typescript ts-node @types/node

Set your environment variables:

export OPENAI_API_KEY="your-openai-key"
export QDRANT_URL="http://localhost:6333"

Step-by-Step

•Start by creating a small document set and converting it into Haystack Document objects. Keep the content simple and domain-specific so retrieval is easy to validate.

import { Document } from "@haystack/core";

const docs = [
  new Document({
    content: "Haystack is a framework for building LLM pipelines with retrieval and generation.",
    meta: { source: "intro" },
  }),
  new Document({
    content: "RAG combines retrieval from a knowledge base with answer generation from an LLM.",
    meta: { source: "rag" },
  }),
  new Document({
    content: "Qdrant is a vector database used to store embeddings and search by similarity.",
    meta: { source: "qdrant" },
  }),
];

•Next, create an embedding model and write the documents into Qdrant. This gives you persistent semantic search instead of keyword matching.

import { OpenAITextEmbedder } from "@haystack/experimental";
import { QdrantDocumentStore } from "@haystack/experimental";

const documentStore = new QdrantDocumentStore({
  url: process.env.QDRANT_URL!,
  collectionName: "haystack_rag_demo",
});

const embedder = new OpenAITextEmbedder({
  apiKey: process.env.OPENAI_API_KEY!,
  model: "text-embedding-3-small",
});

await documentStore.writeDocuments(docs, { policy: "upsert" });
await documentStore.embedDocuments(embedder);

•Build the retrieval step. The retriever takes a query, embeds it, and returns the most relevant documents from Qdrant.

import { QdrantEmbeddingRetriever } from "@haystack/experimental";

const retriever = new QdrantEmbeddingRetriever({
  documentStore,
});

const query = "What is RAG?";
const retrievedDocs = await retriever.run({
  query,
  topK: 2,
});

console.log(retrievedDocs.documents.map((doc) => ({
  content: doc.content,
  meta: doc.meta,
})));

•Add the generator step that turns retrieved context into an answer. The key pattern here is to pass only the top documents, not your whole corpus.

import { OpenAIChatGenerator } from "@haystack/experimental";

const generator = new OpenAIChatGenerator({
  apiKey: process.env.OPENAI_API_KEY!,
  model: "gpt-4o-mini",
});

const context = retrievedDocs.documents
  .map((doc) => doc.content)
  .join("\n\n");

const response = await generator.run({
  messages: [
    {
      role: "system",
      content:
        "Answer only using the provided context. If the context does not contain the answer, say you don't know.",
    },
    {
      role: "user",
      content: `Context:\n${context}\n\nQuestion:\n${query}`,
    },
  ],
});

console.log(response.reply);

•Put retrieval and generation together in one script so you can run the full RAG flow end to end. This is the version you’ll actually iterate on when debugging prompts, chunking, or retrieval quality.

import "dotenv/config";
import { Document } from "@haystack/core";
import {
  OpenAITextEmbedder,
  OpenAIChatGenerator,
  QdrantDocumentStore,
  QdrantEmbeddingRetriever,
} from "@haystack/experimental";

async function main() {
  const docs = [
    new Document({ content: "Haystack is a framework for building LLM pipelines.", meta: { source: "intro" } }),
    new Document({ content: "RAG combines retrieval with generation.", meta: { source: "rag" } }),
    new Document({ content: "Qdrant stores embeddings for similarity search.", meta: { source: "qdrant" } }),
  ];

  const documentStore = new QdrantDocumentStore({
    url: process.env.QDRANT_URL!,
    collectionName: "haystack_rag_demo",
  });

  const embedder = new OpenAITextEmbedder({
    apiKey: process.env.OPENAI_API_KEY!,
    model: "text-embedding-3-small",
  });

  await documentStore.writeDocuments(docs, { policy: "upsert" });
  await documentStore.embedDocuments(embedder);

  const retriever = new QdrantEmbeddingRetriever({ documentStore });
  const query = "How does RAG work?";
  const retrievedDocs = await retriever.run({ query, topK: 2 });

  const generator = new OpenAIChatGenerator({
    apiKey: process.env.OPENAI_API_KEY!,
    model: "gpt-4o-mini",
  });

  const context = retrievedDocs.documents.map((doc) => doc.content).join("\n\n");
}
main();

•Finish by printing the final answer and keeping the script easy to extend. In production, this is where you’d add chunking, metadata filters, caching, and tracing.

// add this inside main(), after context is created

const response = await generator.run({
  messages: [
    {
      role: "system",
      content:
        "Answer only using the provided context. If the context does not contain the answer, say you don't know.",
    },
    {
      role: "user",
      content:
        `Context:\n${context}\n\nQuestion:\n${query}`,
    },
  ],
});

console.log("Question:", query);
console.log("Answer:", response.reply);
console.log("Sources:", retrievedDocs.documents.map((doc) => doc.meta?.source));

Testing It

Run your script with npx ts-node your-file.ts. You should see at least one relevant source returned from Qdrant and an answer that reflects only the retrieved context.

If the model starts hallucinating, check two things first:

•Your prompt says to answer only from context
•Your retriever is returning enough relevant documents

If retrieval looks bad, lower your chunk size later or improve your sample docs before touching the LLM prompt. For beginner RAG systems, bad retrieval is usually the real problem.

Next Steps

•Add chunking before embedding so long documents retrieve more accurately
•Introduce metadata filters for tenant-specific or policy-specific search
•Replace the hardcoded question with an API endpoint and return citations in your response

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit