AutoGen Tutorial (TypeScript): optimizing token usage for advanced developers

By Cyprian AaronsUpdated 2026-04-21

autogenoptimizing-token-usage-for-advanced-developerstypescript

This tutorial shows you how to reduce token spend in an AutoGen TypeScript agent workflow without breaking multi-agent behavior. You’ll build a small setup that trims prompts, caps context growth, and keeps expensive model calls only where they matter.

What You'll Need

•Node.js 18+
•A TypeScript project with ts-node or tsx
•npm or pnpm
•An OpenAI API key
•
Packages:
- •@autogenai/autogen
- •openai
- •dotenv
- •typescript
- •tsx or ts-node

Step-by-Step

•Start with a minimal AutoGen setup and a model that supports the task. The main idea is to keep the default agent cheap, then reserve stronger models for the few places that actually need them.

import "dotenv/config";
import { AssistantAgent, UserProxyAgent } from "@autogenai/autogen";
import { OpenAIChatCompletionClient } from "@autogenai/autogen/openai";

const modelClient = new OpenAIChatCompletionClient({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY!,
});

const assistant = new AssistantAgent({
  name: "assistant",
  modelClient,
  systemMessage: "You are a concise assistant. Keep responses short and structured.",
});

const user = new UserProxyAgent({
  name: "user",
});

async function main() {
  const result = await user.initiateChat(assistant, {
    message: "Summarize three token-saving techniques for AutoGen TypeScript.",
    maxTurns: 2,
  });

  console.log(result.chatHistory.at(-1));
}

main().catch(console.error);

•Stop sending bloated system prompts. In production, the fastest way to burn tokens is a giant instruction blob that repeats policy, style, and business rules every turn.

const compactAssistant = new AssistantAgent({
  name: "compact_assistant",
  modelClient,
  systemMessage: [
    "You are a production assistant for internal tooling.",
    "Answer in bullets.",
    "Do not repeat the question.",
    "If uncertain, ask one clarifying question.",
  ].join(" "),
});

•Trim conversation growth by summarizing older turns before they get expensive. Instead of carrying full chat history forever, keep only the last few turns and replace older content with a compact summary.

type ChatTurn = { role: string; content: string };

function summarizeHistory(history: ChatTurn[], keepLast = 4): ChatTurn[] {
  if (history.length <= keepLast) return history;

  const older = history.slice(0, -keepLast);
  const recent = history.slice(-keepLast);

  const summary = older
    .map((turn) => `${turn.role}: ${turn.content}`)
    .join("\n")
    .slice(0, 1200);

  return [
    {
      role: "system",
      content: `Conversation summary:\n${summary}`,
    },
    ...recent,
  ];
}

•Use a cheap router step before you call the main assistant. If the request is simple, answer with a lightweight model; if it needs deeper reasoning or tool use, escalate only then.

const router = new AssistantAgent({
  name: "router",
  modelClient,
  systemMessage:
    "Classify requests as simple or complex. Reply with only 'simple' or 'complex'.",
});

async function routeRequest(message: string): Promise<"simple" | "complex"> {
  const res = await user.initiateChat(router, {
    message,
    maxTurns: 1,
  });

  const last = String(res.chatHistory.at(-1)?.content ?? "").toLowerCase();
  return last.includes("complex") ? "complex" : "simple";
}

•Put it together with explicit turn limits and short outputs. This keeps your agent from wandering into long back-and-forth loops that do nothing except increase context size.

async function run(message: string) {
  const route = await routeRequest(message);

  const chosenAgent =
    route === "simple"
      ? compactAssistant
      : new AssistantAgent({
          name: "expert_assistant",
          modelClient,
          systemMessage:
            "You handle complex requests. Be precise and avoid unnecessary detail.",
        });

  const result = await user.initiateChat(chosenAgent, {
    message,
    maxTurns: route === "simple" ? 1 : 3,
  });

  console.log(JSON.stringify(result.chatHistory, null, 2));
}

run("Explain how to reduce token usage in AutoGen.");

Testing It

Run the script with a short prompt and confirm you get a one-turn response for simple requests. Then try a more complex prompt and verify the router sends it to the higher-capability path with a larger but still bounded turn count.

Check your API usage in the provider dashboard before and after these changes. You should see fewer prompt tokens per request once you shorten system messages and stop carrying full chat history forward.

If you want a hard verification loop, log each outbound prompt length and compare it across runs. That gives you an immediate signal when someone adds prompt bloat back into the codebase.

Next Steps

•Add token accounting around each agent call so you can enforce budgets per workflow.
•Replace full-history memory with retrieval over summaries plus recent turns.
•Split tool-heavy tasks into planner/executor agents so only one agent pays for deep context at a time.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit