Haystack Tutorial (Python): streaming agent responses for advanced developers

By Cyprian AaronsUpdated 2026-04-21
haystackstreaming-agent-responses-for-advanced-developerspython

This tutorial shows you how to build a Haystack agent that streams partial responses back to your Python app instead of waiting for the full answer. You need this when you’re building chat UIs, tool-heavy assistants, or any workflow where latency matters and users should see progress as the model thinks.

What You'll Need

  • Python 3.10+
  • haystack-ai
  • An OpenAI API key in OPENAI_API_KEY
  • Optional but useful:
    • python-dotenv for local env loading
    • A terminal or notebook to inspect streamed events
  • A basic understanding of:
    • Haystack pipelines
    • Chat messages
    • Tool calling / agent loops

Step-by-Step

  1. Install the dependencies and set your environment variable.
    Keep this minimal; the tutorial uses only Haystack and the OpenAI generator.
pip install haystack-ai python-dotenv
export OPENAI_API_KEY="your-key-here"
  1. Create a small tool the agent can call.
    Streaming is most useful when the agent has work to do, so we’ll give it a deterministic calculator tool.
from haystack.tools import Tool

def multiply(a: int, b: int) -> int:
    return a * b

multiply_tool = Tool(
    name="multiply",
    description="Multiply two integers.",
    parameters={
        "type": "object",
        "properties": {
            "a": {"type": "integer"},
            "b": {"type": "integer"},
        },
        "required": ["a", "b"],
    },
    function=multiply,
)
  1. Build an agent pipeline with a streaming generator.
    The key detail is enabling streaming on the chat generator, then wiring it into an agent that can decide whether to call tools.
import os
from haystack import Pipeline
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.agents import Agent
from haystack.dataclasses import ChatMessage

generator = OpenAIChatGenerator(
    model="gpt-4o-mini",
    streaming_callback=None,
)

agent = Agent(
    llm=generator,
    tools=[multiply_tool],
)

pipe = Pipeline()
pipe.add_component("agent", agent)
  1. Run the pipeline and stream output as it arrives.
    In Haystack, streamed tokens are delivered through the generator callback path, so you attach a callback and print deltas as they come in.
from typing import Any

def on_stream(chunk: Any):
    if hasattr(chunk, "content") and chunk.content:
        print(chunk.content, end="", flush=True)

generator = OpenAIChatGenerator(
    model="gpt-4o-mini",
    streaming_callback=on_stream,
)

agent = Agent(
    llm=generator,
    tools=[multiply_tool],
)

pipe = Pipeline()
pipe.add_component("agent", agent)

result = pipe.run({
    "agent": {
        "messages": [
            ChatMessage.from_user("What is 17 * 23? Explain briefly.")
        ]
    }
})
print("\n\nFinal result:", result)
  1. Handle multi-turn usage cleanly in your own app.
    For production code, keep conversation state outside the pipeline and append new messages each turn so you can stream every response independently.
messages = [
    ChatMessage.from_system("You are a concise assistant."),
]

def ask(question: str):
    messages.append(ChatMessage.from_user(question))
    result = pipe.run({"agent": {"messages": messages}})
    assistant_message = result["agent"]["messages"][-1]
    messages.append(assistant_message)
    return assistant_message.content

answer = ask("Multiply 12 by 14.")
print(answer)

Testing It

Run the script and watch for partial tokens printing before the final result appears. If streaming is wired correctly, you should see text arrive incrementally rather than after a long pause.

Test both direct answers and tool calls. A question like What is 17 * 23? should trigger the calculator tool, while a simple question like Say hello in one sentence should stream a plain language response.

If nothing streams, check three things first: your API key is set, your model supports streaming, and your callback function is actually receiving chunks. Also verify that you’re not buffering stdout in your terminal or notebook environment.

Next Steps

  • Add more tools, such as date parsing or policy lookup functions, and observe how streamed tool-use changes the UX.
  • Persist message history in Redis or Postgres so streaming works across sessions.
  • Wrap this pipeline in FastAPI Server-Sent Events if you need browser-friendly token streaming.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides