Haystack Tutorial (Python): testing agents locally for beginners

By Cyprian AaronsUpdated 2026-04-21

haystacktesting-agents-locally-for-beginnerspython

This tutorial shows you how to build a small Haystack agent in Python and test it locally without wiring it into a full app. You need this when you want to validate tool calls, prompts, and failure handling before the agent ever reaches staging or production.

What You'll Need

•Python 3.10+
•A virtual environment tool like venv
•pip
•An OpenAI API key set as OPENAI_API_KEY
•
Haystack installed with OpenAI support:
- •haystack-ai
- •haystack-integrations
•Basic familiarity with Haystack components like Pipeline, Component, and chat generators

Step-by-Step

•Start by creating a clean environment and installing the packages. For local testing, keep dependencies minimal so failures are easy to trace.

python -m venv .venv
source .venv/bin/activate
pip install haystack-ai haystack-integrations openai
export OPENAI_API_KEY="your-api-key-here"

•Define a tiny tool the agent can call. Here I’m using a calculator component because it is deterministic, easy to assert against, and good for testing tool routing.

from haystack import component

@component
class Calculator:
    @component.output_types(result=str)
    def run(self, expression: str):
        allowed = set("0123456789+-*/(). ")
        if not set(expression) <= allowed:
            return {"result": "Invalid expression"}
        try:
            return {"result": str(eval(expression, {"__builtins__": {}}))}
        except Exception as e:
            return {"result": f"Error: {e}"}

•Wire the tool into a simple agent pipeline. The key idea is that the LLM decides when to call the calculator, and your local test checks whether that decision is correct.

from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator

prompt = ChatPromptBuilder.from_messages([
    ("system", "You are a helpful assistant. Use tools when needed."),
    ("user", "{question}")
])

llm = OpenAIChatGenerator(model="gpt-4o-mini")
calculator = Calculator()

pipe = Pipeline()
pipe.add_component("prompt_builder", prompt)
pipe.add_component("llm", llm)
pipe.add_component("calculator", calculator)

pipe.connect("prompt_builder.prompt", "llm.messages")

•Run a local test case against the pipeline. Keep the input simple at first so you can tell whether the failure is in prompting, model choice, or tool logic.

result = pipe.run(
    data={
        "prompt_builder": {
            "question": "What is 19 * 7?"
        }
    }
)

print(result["llm"]["replies"][0].text)

•Add an explicit assertion-style check so you can use this in real development. For beginners, this is the difference between “it printed something” and “I know the behavior is stable.”

def test_calculator():
    output = pipe.run(
        data={"prompt_builder": {"question": "What is 8 + 12?"}}
    )
    text = output["llm"]["replies"][0].text.lower()
    assert "20" in text, f"Expected 20 in response, got: {text}"

if __name__ == "__main__":
    test_calculator()
    print("Local test passed")

Testing It

Run the script from your terminal and watch for two things: a valid model response and a passing assertion. If the answer is wrong, inspect whether the issue is in your prompt template, your model selection, or your tool component.

For local agent testing, I usually start with three cases:

•A normal math question that should succeed
•A malformed expression like 2 + abc that should fail cleanly
•A question that does not need tools, to see whether the model avoids unnecessary calls

If you want more confidence, wrap these cases in pytest and run them on every change. That gives you fast feedback before you connect anything to real users or downstream systems.

Next Steps

•Add more tools, such as lookup functions for policy data or customer metadata.
•Replace the simple calculator with a real Haystack tool-calling setup using structured outputs.
•Move your checks into pytest fixtures so you can run regression tests locally and in CI

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit