Haystack Tutorial (Python): building a RAG pipeline for beginners

By Cyprian AaronsUpdated 2026-04-21
haystackbuilding-a-rag-pipeline-for-beginnerspython

This tutorial builds a minimal Retrieval-Augmented Generation (RAG) pipeline in Haystack using Python. You’ll index a small document set, retrieve relevant passages for a question, and generate an answer from those passages.

What You'll Need

  • Python 3.10+
  • haystack-ai
  • An OpenAI API key
  • A working internet connection for the embedding and generation API calls
  • A small text corpus to index, stored locally or embedded directly in code

Install the packages first:

pip install haystack-ai openai

Set your API key in the environment:

export OPENAI_API_KEY="your-key-here"

If you’re on Windows PowerShell:

$env:OPENAI_API_KEY="your-key-here"

Step-by-Step

  1. Start by creating a few documents. For a beginner tutorial, keep the corpus tiny so you can see the whole pipeline end to end.
from haystack import Document

documents = [
    Document(content="Haystack is an open-source framework for building LLM applications."),
    Document(content="RAG combines retrieval and generation to answer questions from external knowledge."),
    Document(content="A retriever finds relevant documents, and a generator writes the final answer."),
    Document(content="Embedding models convert text into vectors for semantic search."),
]
  1. Next, build an in-memory document store and write the documents into it. This keeps the example simple and avoids setting up Elasticsearch or PostgreSQL.
from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
document_store.write_documents(documents)
  1. Now create an embedder and index the documents. The same embedding model must be used later for retrieval, otherwise similarity search will be inconsistent.
from haystack.components.embedders import OpenAIDocumentEmbedder

document_embedder = OpenAIDocumentEmbedder(model="text-embedding-3-small")
embedded_docs = document_embedder.run(documents=documents)["documents"]

document_store.write_documents(embedded_docs)
  1. Add a retriever that uses vector similarity over the indexed embeddings. This is the part of RAG that pulls back only the most relevant context for a query.
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever

retriever = InMemoryEmbeddingRetriever(document_store=document_store)

question = "What does RAG combine?"
retrieval_result = retriever.run(query=question, top_k=2)
relevant_docs = retrieval_result["documents"]

for doc in relevant_docs:
    print(doc.content)
  1. Add a prompt builder and generator to turn retrieved context into an answer. Keep the prompt explicit so the model stays grounded in retrieved text instead of guessing.
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

template = """
Answer the question using only the provided documents.

Question: {{ question }}

Documents:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}

Answer:
"""

prompt_builder = PromptBuilder(template=template)
generator = OpenAIGenerator(model="gpt-4o-mini")

prompt_result = prompt_builder.run(question=question, documents=relevant_docs)
answer_result = generator.run(prompt=prompt_result["prompt"])

print(answer_result["replies"][0])
  1. Wire everything together into one pipeline. This is the version you want once you’ve confirmed each component works independently.
from haystack import Pipeline

rag_pipeline = Pipeline()
rag_pipeline.add_component("embedder", document_embedder)
rag_pipeline.add_component("retriever", retriever)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("generator", generator)

rag_pipeline.connect("embedder.documents", "retriever.documents")
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "generator.prompt")

result = rag_pipeline.run({
    "embedder": {"documents": documents},
    "retriever": {"query": question},
    "prompt_builder": {"question": question},
})

print(result["generator"]["replies"][0])

Testing It

Run the script and ask a question that should clearly map to one of your sample documents, like “What does RAG combine?” The retriever should return the document about retrieval and generation, and the final answer should reflect that wording.

If you get empty results, check that embeddings were written into the document store before retrieval. If the generator gives vague answers, tighten the prompt so it says to use only retrieved documents.

A good sanity check is to print relevant_docs before generation. If those are wrong, your issue is retrieval; if those are right but the answer is off, your issue is prompting or model choice.

Next Steps

  • Replace InMemoryDocumentStore with PostgreSQL or Elasticsearch for persistence.
  • Add chunking so you can index real PDFs and long articles instead of short strings.
  • Evaluate retrieval quality with test questions before shipping this into production.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides