Haystack Tutorial (Python): mocking LLM calls in tests for beginners

By Cyprian AaronsUpdated 2026-04-21
haystackmocking-llm-calls-in-tests-for-beginnerspython

This tutorial shows you how to write tests for a Haystack pipeline without calling a real LLM. You need this when your unit tests must be fast, deterministic, and free from API costs or flaky network calls.

What You'll Need

  • Python 3.10+
  • haystack-ai
  • pytest
  • A basic Haystack pipeline with an LLM component
  • No API key required for the mocked test path
  • Optional: an OpenAI or other model key if you want to compare against a real run later

Install the packages:

pip install haystack-ai pytest

Step-by-Step

  1. Start with a small pipeline that uses an LLM component. For testing, keep the pipeline minimal so you can isolate the mock at the component boundary.
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator

template = "Answer the question in one sentence: {{question}}"

prompt_builder = PromptBuilder(template=template)
llm = OpenAIChatGenerator(model="gpt-4o-mini")

pipeline = Pipeline()
pipeline.add_component("prompt_builder", prompt_builder)
pipeline.add_component("llm", llm)
pipeline.connect("prompt_builder.prompt", "llm.messages")
  1. Define a function that runs the pipeline. This gives your test a clean entry point and keeps your production code separate from test setup.
def answer_question(question: str) -> str:
    result = pipeline.run(
        {
            "prompt_builder": {"question": question}
        }
    )
    return result["llm"]["replies"][0].content
  1. In your test, mock the LLM component’s run() method. The key idea is that Haystack components are regular Python objects, so unittest.mock.patch.object() works well here.
from unittest.mock import patch
from haystack.dataclasses import ChatMessage

def test_answer_question_without_real_llm():
    fake_reply = ChatMessage.from_assistant("Paris is the capital of France.")

    with patch.object(
        OpenAIChatGenerator,
        "run",
        return_value={"replies": [fake_reply]},
    ) as mocked_run:
        output = answer_question("What is the capital of France?")

    assert output == "Paris is the capital of France."
    mocked_run.assert_called_once()
  1. If you want stronger isolation, patch the instance used by the pipeline instead of the class. This is useful when your code builds multiple generators and you only want one specific object mocked.
from unittest.mock import MagicMock

def test_answer_question_with_instance_mock():
    fake_reply = ChatMessage.from_assistant("Mocked response.")

    llm.run = MagicMock(return_value={"replies": [fake_reply]})

    output = answer_question("Any question here?")

    assert output == "Mocked response."
    llm.run.assert_called_once()
  1. For larger projects, wrap pipeline construction in a factory function and inject dependencies during tests. That makes mocking cleaner and avoids global objects in your test suite.
def build_pipeline(llm_component):
    p = Pipeline()
    p.add_component("prompt_builder", PromptBuilder(template=template))
    p.add_component("llm", llm_component)
    p.connect("prompt_builder.prompt", "llm.messages")
    return p

def answer_question_with_pipeline(question: str, p: Pipeline) -> str:
    result = p.run({"prompt_builder": {"question": question}})
    return result["llm"]["replies"][0].content

Testing It

Run your tests with pytest -q. If the mock is wired correctly, the test will pass without any network access and without requiring an API key.

To confirm you are not hitting the real model, temporarily disconnect from the internet or unset any provider credentials. The test should still pass because OpenAIChatGenerator.run() never executes its real request path.

If you see authentication errors or rate-limit errors, your mock is not applied at the right level. In that case, check whether you patched the class used by your code or just a different instance.

Next Steps

  • Mock retrieval components too, not just generators, so full RAG pipelines stay deterministic in unit tests.
  • Move from unit tests to integration tests with a small number of real API calls behind a separate marker like @pytest.mark.integration.
  • Learn how to use dependency injection in Haystack pipelines so every external service can be swapped cleanly in tests.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides