AutoGen Tutorial (Python): streaming agent responses for advanced developers

By Cyprian AaronsUpdated 2026-04-21

autogenstreaming-agent-responses-for-advanced-developerspython

This tutorial shows how to stream AutoGen agent responses in Python so you can surface partial output token-by-token instead of waiting for a full completion. You need this when building chat UIs, long-running workflows, or observability pipelines where latency and intermediate progress matter.

What You'll Need

•Python 3.10+
•autogen-agentchat and autogen-ext
•An OpenAI API key set as OPENAI_API_KEY
•A terminal that can run async Python scripts
•Basic familiarity with AutoGen agents and model clients

Install the packages:

pip install autogen-agentchat autogen-ext openai

Step-by-Step

•Start by creating a model client that supports streaming. In AutoGen, the model client is separate from the agent, so you can swap providers later without changing your orchestration code.

import asyncio
import os

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

model_client = OpenAIChatCompletionClient(
    model="gpt-4o-mini",
    api_key=os.environ["OPENAI_API_KEY"],
)

•Create an assistant agent and give it a tight system message. Keep the instruction specific because streaming makes it easier to inspect whether the model is following your constraints in real time.

agent = AssistantAgent(
    name="support_agent",
    model_client=model_client,
    system_message=(
        "You are a concise banking support assistant. "
        "Answer in short paragraphs and avoid unnecessary detail."
    ),
)

•Run the agent with streaming enabled and print events as they arrive. The key detail is using run_stream() inside an async loop, which gives you incremental updates instead of a single final response.

from autogen_agentchat.messages import TextMessage

async def main():
    task = TextMessage(
        content="Explain how overdraft protection works in 3 bullet points.",
        source="user",
    )

    async for event in agent.run_stream(task=task):
        print(type(event).__name__, event)

if __name__ == "__main__":
    asyncio.run(main())

•If you want cleaner output, filter for text deltas and final messages only. This is what you would do in a production console app or backend service where raw event objects are too noisy.

import asyncio
import os

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient

async def main():
    model_client = OpenAIChatCompletionClient(
        model="gpt-4o-mini",
        api_key=os.environ["OPENAI_API_KEY"],
    )

    agent = AssistantAgent(
        name="support_agent",
        model_client=model_client,
        system_message="You are a concise banking support assistant.",
    )

    task = TextMessage(content="Summarize KYC in one paragraph.", source="user")

    async for event in agent.run_stream(task=task):
        if hasattr(event, "content") and isinstance(event.content, str):
            print(event.content)

if __name__ == "__main__":
    asyncio.run(main())

•Wrap the stream in your own handler if you need to forward tokens to WebSockets, SSE, or logs. The pattern is the same: consume events as they arrive, then push them to your transport layer.

async def stream_to_console(agent: AssistantAgent, prompt: str):
    task = TextMessage(content=prompt, source="user")

    async for event in agent.run_stream(task=task):
        if hasattr(event, "content") and isinstance(event.content, str):
            print(event.content, end="", flush=True)

async def main():
    model_client = OpenAIChatCompletionClient(
        model="gpt-4o-mini",
        api_key=os.environ["OPENAI_API_KEY"],
    )
    agent = AssistantAgent("assistant", model_client=model_client)
    await stream_to_console(agent, "Write a short claims update email.")

if __name__ == "__main__":
    asyncio.run(main())

Testing It

Run the script from your terminal and confirm that output appears progressively rather than all at once. If you only see one final block after several seconds, check that you are calling run_stream() and not run(). Also verify that OPENAI_API_KEY is set in the environment before starting the script.

A good test prompt is something with enough length to produce visible streaming, like a summary or an email draft. If you are integrating into an API server, confirm that each chunk reaches your client before the full response completes.

Next Steps

•Add tool calling to streamed agents so they can fetch policy data or account context mid-response.
•Build a FastAPI SSE endpoint that forwards AutoGen stream events to a browser.
•Explore multi-agent streaming when one agent delegates work to another and you want live progress across the chain.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit