AutoGen Tutorial (Python): streaming agent responses for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
autogenstreaming-agent-responses-for-intermediate-developerspython

This tutorial shows you how to stream an AutoGen agent’s responses in Python instead of waiting for the full reply to finish. You need this when you want live token-by-token output in a CLI, when you’re wiring agents into a web UI, or when long-running model calls need visible progress.

What You'll Need

  • Python 3.10+
  • autogen-agentchat
  • autogen-ext
  • An OpenAI API key set as OPENAI_API_KEY
  • Basic familiarity with AutoGen agents and model clients
  • A terminal where you can run Python scripts

Install the packages first:

pip install -U autogen-agentchat autogen-ext openai

Step-by-Step

  1. Start by creating a model client that supports streaming. In AutoGen, the cleanest path is to use the OpenAI chat completion client from autogen_ext.
import asyncio
import os

from autogen_ext.models.openai import OpenAIChatCompletionClient

client = OpenAIChatCompletionClient(
    model="gpt-4o-mini",
    api_key=os.environ["OPENAI_API_KEY"],
)
  1. Create a simple assistant agent and send it a message with streaming enabled. The key detail is using the streamed result iterator instead of calling a blocking one-shot method.
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage

agent = AssistantAgent(
    name="assistant",
    model_client=client,
    system_message="You are a concise assistant.",
)

async def main():
    result = await agent.run_stream(
        task=TextMessage(content="Explain streaming in one paragraph.", source="user")
    )

    async for event in result:
        print(event)

asyncio.run(main())
  1. Print only the text deltas if you want token-like output. The stream contains multiple event types, so in production you usually filter for message content rather than dumping every event object.
import asyncio
import os

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_core.models import ModelClientStreamingChunkEvent

from autogen_ext.models.openai import OpenAIChatCompletionClient

client = OpenAIChatCompletionClient(
    model="gpt-4o-mini",
    api_key=os.environ["OPENAI_API_KEY"],
)

agent = AssistantAgent(
    name="assistant",
    model_client=client,
    system_message="You are a concise assistant.",
)

async def main():
    stream = await agent.run_stream(
        task=TextMessage(content="Write a 3-bullet summary of streaming.", source="user")
    )

    async for event in stream:
        if isinstance(event, ModelClientStreamingChunkEvent):
            print(event.content, end="", flush=True)

asyncio.run(main())
  1. Capture the final answer after streaming finishes. This matters because streamed chunks are useful for UX, but your application usually still needs the completed response for storage or downstream processing.
import asyncio
import os

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_core.models import ModelClientStreamingChunkEvent

from autogen_ext.models.openai import OpenAIChatCompletionClient

client = OpenAIChatCompletionClient(
    model="gpt-4o-mini",
    api_key=os.environ["OPENAI_API_KEY"],
)

agent = AssistantAgent(
    name="assistant",
    model_client=client,
)

async def main():
    stream = await agent.run_stream(
        task=TextMessage(content="Give me one sentence about AutoGen.", source="user")
    )

    final_text = []
    async for event in stream:
        if isinstance(event, ModelClientStreamingChunkEvent):
            print(event.content, end="", flush=True)
            final_text.append(event.content)

    print("\n\nDone.")

asyncio.run(main())
  1. Wrap it into a reusable helper so you can plug it into a CLI or web backend. This is the pattern you want when multiple endpoints need the same streaming behavior.
import asyncio
import os

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_core.models import ModelClientStreamingChunkEvent
from autogen_ext.models.openai import OpenAIChatCompletionClient


async def stream_reply(prompt: str) -> str:
    client = OpenAIChatCompletionClient(
        model="gpt-4o-mini",
        api_key=os.environ["OPENAI_API_KEY"],
    )

    agent = AssistantAgent(name="assistant", model_client=client)
    stream = await agent.run_stream(task=TextMessage(content=prompt, source="user"))

    chunks = []
    async for event in stream:
        if isinstance(event, ModelClientStreamingChunkEvent):
            print(event.content, end="", flush=True)
            chunks.append(event.content)

    return "".join(chunks)


if __name__ == "__main__":
    asyncio.run(stream_reply("Explain what streaming buys us in one paragraph."))

Testing It

Run the script from your terminal and confirm that text appears incrementally instead of all at once. If nothing prints until the end, you’re probably using a non-streaming call path or not filtering for ModelClientStreamingChunkEvent. Also verify that OPENAI_API_KEY is present in your environment before starting the script.

A good test prompt is something slightly longer than a one-liner, like asking for a short explanation plus three bullets. That gives the model enough room to emit multiple chunks so you can see the streaming behavior clearly.

Next Steps

  • Add conversation memory and stream multi-turn chats instead of single prompts.
  • Route streamed chunks into FastAPI Server-Sent Events or WebSockets.
  • Add tool use to your agent and learn how streamed responses behave during function calls.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides