AutoGen Tutorial (TypeScript): optimizing token usage for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
autogenoptimizing-token-usage-for-intermediate-developerstypescript

This tutorial shows you how to reduce token spend in a TypeScript AutoGen agent setup without breaking the conversation flow. You’ll learn how to keep prompts tight, cap context growth, and stop sending useless history back to the model.

What You'll Need

  • Node.js 18+
  • A TypeScript project with ts-node or tsx
  • AutoGen for TypeScript installed:
    • npm install @autogenai/autogen-core @autogenai/autogen-ext openai
  • An OpenAI API key set in your environment:
    • export OPENAI_API_KEY="your-key"
  • Basic familiarity with AutoGen agents, model clients, and async/await
  • A terminal and a text editor

Step-by-Step

  1. Start by using a small, explicit system message and a cheaper model. Token waste usually starts with vague instructions and oversized model choices, so keep both under control from the first line.
import { AssistantAgent } from "@autogenai/autogen-core";
import { OpenAIChatCompletionClient } from "@autogenai/autogen-ext/models/openai";

const modelClient = new OpenAIChatCompletionClient({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

const agent = new AssistantAgent({
  name: "support_agent",
  modelClient,
  systemMessage: [
    "You are a concise insurance support assistant.",
    "Answer in under 120 words.",
    "Ask one clarifying question only if required.",
    "Do not restate the user's full message.",
  ].join(" "),
});
  1. Trim conversation history before every call. If you keep sending the full chat log, token usage grows linearly and quickly becomes your biggest cost driver.
type ChatTurn = {
  role: "user" | "assistant";
  content: string;
};

function compactHistory(history: ChatTurn[], maxTurns = 4): ChatTurn[] {
  return history.slice(-maxTurns);
}

const history: ChatTurn[] = [
  { role: "user", content: "My policy renewal failed." },
  { role: "assistant", content: "Can you share the error code?" },
  { role: "user", content: "It says PAYMENT_DECLINED." },
];

const recentHistory = compactHistory(history, 2);
console.log(recentHistory);
  1. Add a summarization pass for long threads instead of replaying everything. This keeps the agent aware of prior context while replacing dozens of turns with a short memory object.
import { UserProxyAgent } from "@autogenai/autogen-core";

const summarizer = new AssistantAgent({
  name: "summarizer",
  modelClient,
  systemMessage:
    "Summarize the thread into facts, decisions, open questions, and constraints in under 80 words.",
});

const userProxy = new UserProxyAgent({ name: "operator" });

async function summarizeThread(transcript: string): Promise<string> {
  const result = await summarizer.run([
    { role: "user", content: transcript },
  ]);
  return result.messages.at(-1)?.content ?? "";
}
  1. Send structured inputs instead of dumping raw text blobs into the prompt. When you turn messy notes into fields, the model spends fewer tokens parsing and more tokens reasoning.
type ClaimContext = {
  policyId: string;
  claimId: string;
  issueType: string;
  customerStatus: string;
};

async function handleClaim(context: ClaimContext) {
  const prompt = [
    `policyId=${context.policyId}`,
    `claimId=${context.claimId}`,
    `issueType=${context.issueType}`,
    `customerStatus=${context.customerStatus}`,
    "Respond with the next best action only.",
  ].join("\n");

  const result = await agent.run([{ role: "user", content: prompt }]);
  console.log(result.messages.at(-1)?.content);
}

handleClaim({
  policyId: "POL-10293",
  claimId: "CLM-88421",
  issueType: "payment_declined",
  customerStatus: "verified",
});
  1. Put hard limits on output length and stop asking for unnecessary formatting. In production, verbose markdown answers are token-expensive and usually not what downstream systems need anyway.
const terseAgent = new AssistantAgent({
  name: "terse_agent",
  modelClient,
  systemMessage:
    "Return plain text only. Max 3 bullet points. No preamble.",
});

async function getTerseAnswer(question: string) {
  const result = await terseAgent.run([
    {
      role: "user",
      content:
        `Question: ${question}\n` +
        `Constraints: answer in <=60 words.`,
    },
  ]);

  console.log(result.messages.at(-1)?.content);
}

getTerseAnswer("What should I do if a claim is missing documents?");
  1. Measure token usage so you can prove your changes helped. If you don’t inspect usage per request, you’re guessing.
async function runWithUsage(promptText: string) {
  const result = await agent.run([{ role: "user", content: promptText }]);

  console.log("Assistant:", result.messages.at(-1)?.content);

}

runWithUsage("Summarize this policy issue in one sentence.");

Testing It

Run a few representative conversations before and after these changes. Compare response length, number of turns sent to the model, and whether long threads still preserve the important facts after summarization.

For a real test, use one short request and one long multi-turn request. The short request should stay cheap because of the tighter prompt, while the long request should stop growing once your history compaction kicks in.

If your responses start losing critical context, increase the summary quality before increasing the window size. In practice, that gives you better cost control than just sending more tokens.

Next Steps

  • Add retrieval so you send only relevant policy or claims snippets instead of full documents
  • Build a middleware layer that logs prompt size and response size per agent call
  • Learn AutoGen group chat patterns so you can control which agent sees which context

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides