AutoGen Tutorial (Python): streaming agent responses for beginners

By Cyprian AaronsUpdated 2026-04-21
autogenstreaming-agent-responses-for-beginnerspython

This tutorial shows you how to stream AutoGen agent responses in Python instead of waiting for a full reply. You need this when you want live token-by-token output in a CLI, log stream, web socket, or UI that should feel responsive.

What You'll Need

  • Python 3.10+
  • autogen-agentchat
  • autogen-ext
  • An OpenAI API key set as an environment variable
  • Basic familiarity with AssistantAgent and UserProxyAgent
  • A terminal where you can run Python scripts

Install the packages first:

pip install autogen-agentchat autogen-ext[openai]

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

  1. Start with a minimal AutoGen agent setup.
    We’ll use a single assistant agent and ask it to stream its response back instead of returning one final string.
import asyncio

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient


async def main() -> None:
    model_client = OpenAIChatCompletionClient(
        model="gpt-4o-mini",
    )

    agent = AssistantAgent(
        name="assistant",
        model_client=model_client,
    )

    print("Ready.")


if __name__ == "__main__":
    asyncio.run(main())
  1. Send a prompt using the streaming API.
    The important part is calling run_stream() and iterating over the events it yields. That gives you partial progress as the model generates output.
import asyncio

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient


async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
    agent = AssistantAgent(name="assistant", model_client=model_client)

    stream = agent.run_stream(task="Write a 3-bullet summary of why streaming is useful.")

    async for event in stream:
        print(type(event).__name__, event)


if __name__ == "__main__":
    asyncio.run(main())
  1. Print only the text as it arrives.
    The event stream includes more than just text, so filter for message events and extract the content. This is the pattern you want for a terminal app or server logs.
import asyncio

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient


async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
    agent = AssistantAgent(name="assistant", model_client=model_client)

    async for event in agent.run_stream(task="Explain streaming in one short paragraph."):
        if isinstance(event, TextMessage):
            print(event.content, end="", flush=True)


if __name__ == "__main__":
    asyncio.run(main())
  1. Handle streamed chunks and add a newline when the response finishes.
    In many apps, you want partial output during generation but also clean formatting after completion. The final result is available after the stream ends.
import asyncio

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient


async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
    agent = AssistantAgent(name="assistant", model_client=model_client)

    async for event in agent.run_stream(task="List 3 benefits of streaming responses."):
        if isinstance(event, TextMessage):
            print(event.content, end="", flush=True)

    print()


if __name__ == "__main__":
    asyncio.run(main())
  1. Wrap it in a reusable function for production code.
    Once this works, move the streaming logic into a helper so your app can call it from a CLI command, FastAPI endpoint, or background worker.
import asyncio

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient


async def stream_answer(agent: AssistantAgent, prompt: str) -> None:
    async for event in agent.run_stream(task=prompt):
        if isinstance(event, TextMessage):
            print(event.content, end="", flush=True)
    print()


async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
    agent = AssistantAgent(name="assistant", model_client=model_client)

    await stream_answer(agent, "Give me a short explanation of AutoGen streaming.")


if __name__ == "__main__":
    asyncio.run(main())

Testing It

Run the script from your terminal and watch for output appearing before the full answer is complete. If nothing prints until the end, you are probably not iterating over run_stream() correctly or you are filtering out the wrong event type.

Use a prompt that produces several sentences so the difference is obvious. You should also confirm your OPENAI_API_KEY is set in the same shell session where you run Python.

If you want to be sure the code path is correct, temporarily print type(event).__name__ inside the loop. That tells you exactly which streamed events AutoGen is emitting in your version.

Next Steps

  • Add a second agent and stream multi-agent conversations with GroupChat
  • Return streamed chunks through FastAPI Server-Sent Events or WebSockets
  • Persist streamed transcripts so you can replay conversations later

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides