AutoGen Tutorial (TypeScript): streaming agent responses for advanced developers

By Cyprian AaronsUpdated 2026-04-21
autogenstreaming-agent-responses-for-advanced-developerstypescript

This tutorial shows you how to stream agent output from AutoGen in TypeScript so you can start rendering tokens, tool progress, and partial responses before the model finishes. You need this when you’re building chat UIs, long-running agent workflows, or any system where waiting for the full response makes the product feel slow.

What You'll Need

  • Node.js 18+
  • TypeScript 5+
  • An OpenAI API key
  • autogen-ext and @autogen/core
  • A project configured for ES modules
  • Basic familiarity with AutoGen agents and model clients

Install the packages first:

npm install @autogen/core autogen-ext openai
npm install -D typescript tsx @types/node

Set your API key:

export OPENAI_API_KEY="your-api-key"

Step-by-Step

  1. Create a streaming-capable model client.

AutoGen’s TypeScript stack uses a model client to talk to OpenAI. For streaming, you want a client that can emit incremental chunks instead of waiting for a full completion.

import { OpenAIChatCompletionClient } from "autogen-ext/models/openai";

const modelClient = new OpenAIChatCompletionClient({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
});
  1. Create an assistant agent that uses that client.

The agent does the orchestration; the model client handles generation. Keep the system message tight so streamed output stays focused and predictable.

import { AssistantAgent } from "@autogen/core";

const agent = new AssistantAgent({
  name: "support_agent",
  modelClient,
  systemMessage:
    "You are a concise support assistant. Answer directly and use short paragraphs.",
});
  1. Subscribe to streamed events and print partial output.

This is the core pattern. Instead of calling a plain run(), iterate over streamed events and handle text deltas as they arrive.

import { TextDeltaEvent } from "@autogen/core";
import { toAsyncIterable } from "@autogen/core/runtime";

async function main() {
  const stream = await agent.runStream([
    { role: "user", content: "Explain how streaming helps in a customer support bot." },
  ]);

  for await (const event of toAsyncIterable(stream)) {
    if (event instanceof TextDeltaEvent) {
      process.stdout.write(event.content);
    }
  }

  process.stdout.write("\n");
}

main().catch(console.error);
  1. Add structured handling for completion, errors, and non-text events.

In production, don’t assume every event is text. You want explicit branches for tool calls, final messages, and failures so your UI or worker doesn’t break on unexpected event types.

import {
  AssistantAgent,
  TextDeltaEvent,
  MessageEvent,
} from "@autogen/core";
import { OpenAIChatCompletionClient } from "autogen-ext/models/openai";

const modelClient = new OpenAIChatCompletionClient({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

const agent = new AssistantAgent({
  name: "streaming_agent",
  modelClient,
});

async function run() {
  const stream = await agent.runStream([
    { role: "user", content: "Give me three reasons to stream agent responses." },
  ]);

  for await (const event of stream) {
    if (event instanceof TextDeltaEvent) {
      process.stdout.write(event.content);
    } else if (event instanceof MessageEvent) {
      console.log("\n\nFinal message received.");
    }
  }
}

run().catch((err) => {
  console.error("Agent failed:", err);
});
  1. Wrap it in a reusable helper for your app layer.

This is the version you actually keep in an application service. It gives you one function that returns streamed text while still letting you swap in WebSockets, SSE, or terminal output later.

import { AssistantAgent, TextDeltaEvent } from "@autogen/core";
import { OpenAIChatCompletionClient } from "autogen-ext/models/openai";

const client = new OpenAIChatCompletionClient({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

const agent = new AssistantAgent({
  name: "agent",
  modelClient: client,
});

export async function streamAnswer(prompt: string): Promise<void> {
  const stream = await agent.runStream([{ role: "user", content: prompt }]);

  for await (const event of stream) {
    if (event instanceof TextDeltaEvent) {
      process.stdout.write(event.content);
    }
  }

  process.stdout.write("\n");
}

Testing It

Run the script with tsx so TypeScript executes directly without a build step:

npx tsx src/index.ts

You should see output appear incrementally instead of as one block at the end. If nothing streams, check that your environment variable is set and that your prompt is actually producing enough text to observe multiple chunks.

For a stronger test, add timestamps around each emitted delta and confirm they arrive over time rather than all at once. If you later wire this into an HTTP endpoint, verify that your client receives partial data before the request completes.

Next Steps

  • Wire runStream() into Server-Sent Events so browsers can render token-by-token output.
  • Add tool calling and stream both assistant text and tool progress in one UI.
  • Persist streamed traces so you can debug latency spikes and incomplete generations later.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides