Haystack Tutorial (Python): streaming agent responses for beginners

By Cyprian AaronsUpdated 2026-04-21
haystackstreaming-agent-responses-for-beginnerspython

This tutorial shows you how to build a Haystack agent in Python that streams partial responses back to the caller instead of waiting for the full answer. You need this when you’re building chat UIs, long-running tool-using agents, or any app where users should see progress instead of staring at a spinner.

What You'll Need

  • Python 3.10+
  • A virtual environment
  • haystack-ai
  • openai
  • An OpenAI API key in OPENAI_API_KEY
  • Basic familiarity with Haystack pipelines and components

Install the packages first:

pip install haystack-ai openai

Step-by-Step

  1. Create a minimal Haystack pipeline that can answer questions with an LLM.

We’ll start with a plain generator component so the streaming example stays focused on the response flow, not retrieval plumbing.

import os
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator

template = """
You are a helpful assistant.
Answer the question clearly and briefly.

Question: {{question}}
"""

pipe = Pipeline()
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component(
    "llm",
    OpenAIChatGenerator(model="gpt-4o-mini", streaming_callback=None),
)
pipe.connect("prompt_builder.prompt", "llm.messages")
  1. Add a small helper to print streamed tokens as they arrive.

Haystack’s chat generators can stream output through a callback. For beginners, the simplest useful pattern is to collect chunks in memory while also printing them live.

from typing import List

class StreamPrinter:
    def __init__(self):
        self.chunks: List[str] = []

    def __call__(self, chunk):
        text = chunk.content or ""
        if text:
            self.chunks.append(text)
            print(text, end="", flush=True)

stream_printer = StreamPrinter()
  1. Attach the streaming callback to the generator and run the pipeline.

This is the core change: set streaming_callback on the generator before calling run(). The callback fires for each token or chunk, depending on provider behavior.

pipe.get_component("llm").streaming_callback = stream_printer

result = pipe.run(
    {
        "prompt_builder": {
            "question": "Explain streaming responses in one paragraph."
        }
    }
)

print("\n\n--- Final result object ---")
print(result)
  1. Wrap it in a reusable function for agent-style usage.

In real apps, you want one function that takes user input and returns both the final text and the streamed text buffer. This keeps your UI layer simple and makes testing easier.

def ask_with_stream(question: str) -> str:
    printer = StreamPrinter()
    pipe.get_component("llm").streaming_callback = printer

    pipe.run(
        {
            "prompt_builder": {
                "question": question
            }
        }
    )

    return "".join(printer.chunks)

answer = ask_with_stream("What is Haystack streaming good for?")
print("\n\nCollected answer:")
print(answer)
  1. Put it behind a simple interactive loop.

This gives you something close to an agent chat loop without adding tools yet. It also makes it obvious whether tokens are arriving incrementally or only at the end.

def chat():
    while True:
        question = input("\nYou: ").strip()
        if question.lower() in {"exit", "quit"}:
            break

        print("Assistant: ", end="", flush=True)
        response = ask_with_stream(question)
        print("\n")

if __name__ == "__main__":
    chat()

Testing It

Run the script from your terminal and ask a question that produces more than one sentence. If streaming is working, you should see text appear gradually after Assistant: instead of all at once after a pause.

If nothing streams, check three things first: your OPENAI_API_KEY is set, streaming_callback is attached before run(), and your model supports streaming. Also confirm you installed haystack-ai, not an older Haystack package name from past versions.

A good smoke test is to ask for a longer explanation like “Explain retries in API clients with an example.” That usually produces enough output to make streaming behavior obvious.

Next Steps

  • Add tools with ToolInvoker so your agent can stream while calling external services.
  • Replace the prompt-only pipeline with retrieval using DocumentStore and Retrievers.
  • Add structured logging around streamed chunks so you can debug latency in production.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides