Haystack Tutorial (Python): building a RAG pipeline for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
haystackbuilding-a-rag-pipeline-for-intermediate-developerspython

This tutorial builds a working Retrieval-Augmented Generation (RAG) pipeline in Python with Haystack. You need this when you want an LLM to answer from your own documents instead of guessing from its pretraining.

What You'll Need

  • Python 3.10+
  • haystack-ai
  • openai or another chat model provider supported by Haystack
  • An embedding model provider supported by Haystack
  • A document source, such as:
    • local .txt files
    • PDFs converted to text
    • internal policy docs
  • API keys set in environment variables:
    • OPENAI_API_KEY
  • Basic familiarity with:
    • Python functions and classes
    • vector search concepts
    • Haystack components like Document, Pipeline, and retrievers

Install the packages:

pip install haystack-ai openai

Step-by-Step

  1. Start by loading a small document set and converting it into Haystack Document objects. For a real system, this would come from files, S3, SharePoint, or a document store.
from haystack import Document

documents = [
    Document(content="Haystack is an open-source framework for building search and RAG pipelines."),
    Document(content="RAG combines retrieval with generation so answers are grounded in source documents."),
    Document(content="A good pipeline chunks documents, embeds them, retrieves relevant passages, then generates an answer.")
]

print(f"Loaded {len(documents)} documents")
  1. Next, build an in-memory index using embeddings and a retriever. This is the core of the retrieval side: convert text into vectors, store them, then fetch the most relevant chunks for each query.
import os
from haystack import Pipeline
from haystack.components.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

indexing_pipeline = Pipeline()
indexing_pipeline.add_component("doc_embedder", OpenAIDocumentEmbedder(model="text-embedding-3-small"))
indexing_pipeline.add_component("retriever_writer", document_store.write_documents)

indexing_pipeline.connect("doc_embedder.documents", "retriever_writer.documents")

embedded_docs = indexing_pipeline.run({"doc_embedder": {"documents": documents}})
print("Documents indexed")
  1. Now wire up the generator side with a chat model and prompt builder. The prompt should force the model to answer only from retrieved context so you don’t get vague or invented responses.
from haystack.components.builders import PromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator

template = """
Answer the question using only the provided documents.

Documents:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}

Question: {{ question }}
Answer:
"""

prompt_builder = PromptBuilder(template=template)
generator = OpenAIChatGenerator(model="gpt-4o-mini")
  1. Assemble the full RAG pipeline by connecting query embedding, retrieval, prompt building, and generation. This is the part you will actually call at runtime.
rag_pipeline = Pipeline()
rag_pipeline.add_component("query_embedder", OpenAITextEmbedder(model="text-embedding-3-small"))
rag_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("generator", generator)

rag_pipeline.connect("query_embedder.embedding", "retriever.query_embedding")
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "generator.messages")
  1. Run a query and inspect the answer. In production, you would also log retrieved documents so you can debug why a response was produced.
question = "What does RAG combine?"

result = rag_pipeline.run({
    "query_embedder": {"text": question},
    "prompt_builder": {"question": question}
})

answer = result["generator"]["replies"][0].content
print(answer)

Testing It

Run the script with your OPENAI_API_KEY exported in the shell before execution. If everything is wired correctly, the retriever should return relevant chunks and the generator should answer using those chunks only.

Test with questions that are directly covered by your sample documents first, then try out-of-scope questions to confirm it doesn’t hallucinate details. If you want more confidence, print result["retriever"]["documents"] and verify the top hits match the query intent.

Next Steps

  • Add a real document ingestion layer with file loaders and chunking.
  • Replace InMemoryDocumentStore with PostgreSQL + pgvector or Elasticsearch for persistence.
  • Add evaluation: exact-match checks for known questions plus retrieval relevance metrics.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides