Haystack Tutorial (Python): mocking LLM calls in tests for advanced developers

By Cyprian AaronsUpdated 2026-04-21
haystackmocking-llm-calls-in-tests-for-advanced-developerspython

This tutorial shows how to replace real LLM calls in Haystack pipelines with deterministic mocks, so your tests run fast, offline, and without burning API quota. You need this when you want stable unit tests for prompt logic, routing, parsing, and pipeline wiring without depending on network calls or model drift.

What You'll Need

  • Python 3.10+
  • haystack-ai
  • pytest
  • Optional: openai or another LLM provider package if you also run integration tests
  • A working Haystack pipeline already built with a chat generator or prompt builder
  • Basic familiarity with Pipeline, ChatPromptBuilder, and OpenAIChatGenerator

Step-by-Step

  1. Start by isolating the LLM behind a component boundary. In tests, you want to swap the real generator for a fake component that returns a fixed response object with the same shape Haystack expects.
from typing import List

from haystack import component
from haystack.dataclasses import ChatMessage


@component
class MockChatGenerator:
    @component.output_types(replies=List[ChatMessage])
    def run(self, messages: List[ChatMessage]):
        return {
            "replies": [
                ChatMessage.from_assistant("Mocked answer: approved")
            ]
        }
  1. Build your production pipeline in a way that makes dependency injection trivial. The key is to keep the generator as a constructor argument instead of instantiating it deep inside test-unfriendly code.
from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator


def build_pipeline(generator=None) -> Pipeline:
    pipe = Pipeline()
    pipe.add_component("prompt_builder", ChatPromptBuilder())
    pipe.add_component(
        "llm",
        generator or OpenAIChatGenerator(model="gpt-4o-mini"),
    )

    pipe.connect("prompt_builder.prompt", "llm.messages")
    return pipe
  1. Write a test that injects the mock and asserts on the returned text. This is the cleanest pattern for unit tests because it validates your prompt flow without calling any external service.
from haystack.dataclasses import ChatMessage


def test_pipeline_uses_mocked_llm():
    pipe = build_pipeline(generator=MockChatGenerator())

    result = pipe.run(
        {
            "prompt_builder": {
                "template_variables": {"name": "Alice"},
                "template": [
                    ChatMessage.from_user("Review customer {{ name }} for approval.")
                ],
            }
        }
    )

    reply = result["llm"]["replies"][0].content
    assert reply == "Mocked answer: approved"
  1. If you need more advanced behavior, make the mock data-driven. That lets you simulate refusals, timeouts, malformed outputs, or different branches in your orchestration logic.
from typing import List

from haystack import component
from haystack.dataclasses import ChatMessage


@component
class RuleBasedMockChatGenerator:
    @component.output_types(replies=List[ChatMessage])
    def run(self, messages: List[ChatMessage]):
        text = "\n".join(m.content for m in messages)

        if "refund" in text.lower():
            content = "Escalate to claims."
        elif "vip" in text.lower():
            content = "Approve immediately."
        else:
            content = "Manual review required."

        return {"replies": [ChatMessage.from_assistant(content)]}
  1. For larger test suites, keep the mock in a fixture and reuse it across pipeline tests. That keeps your assertions focused on business logic instead of setup noise.
import pytest


@pytest.fixture
def mocked_pipeline():
    return build_pipeline(generator=RuleBasedMockChatGenerator())


def test_vip_path(mocked_pipeline):
    result = mocked_pipeline.run(
        {
            "prompt_builder": {
                "template_variables": {"name": "VIP customer"},
                "template": [ChatMessage.from_user("Handle VIP account.")],
            }
        }
    )
    assert result["llm"]["replies"][0].content == "Approve immediately."

Testing It

Run the tests with pytest -q. If everything is wired correctly, they should pass without any API key set and without making outbound requests.

A good sanity check is to temporarily remove your network connection and rerun the suite; unit tests using the mock should still pass. If one fails trying to reach a provider, you still have a real generator hidden somewhere in the code path.

For production code, keep one separate integration test that uses the real LLM behind an environment flag like RUN_LLM_TESTS=1. That gives you coverage for provider auth and schema compatibility without making every CI run dependent on external services.

Next Steps

  • Add snapshot-style assertions for full prompt payloads so prompt regressions are caught early.
  • Mock tool-calling components next, especially if your Haystack pipeline routes between retrieval and generation.
  • Separate unit tests from contract tests so only a small subset ever touches live model APIs.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides