AutoGen Tutorial (TypeScript): optimizing token usage for beginners

By Cyprian AaronsUpdated 2026-04-21

autogenoptimizing-token-usage-for-beginnerstypescript

This tutorial shows you how to reduce token usage in a TypeScript AutoGen setup without breaking the agent workflow. You’ll build a small, production-friendly pattern that trims prompt bloat, limits conversation growth, and keeps model calls focused.

What You'll Need

•Node.js 18+ installed
•A TypeScript project with ts-node or a build step
•npm or pnpm
•OpenAI API key set as an environment variable
•
AutoGen packages:
- •@autogenai/autogen
- •openai
- •dotenv
•Basic familiarity with AutoGen agents and message passing

Step-by-Step

•Start with a minimal project setup and keep your config in one place. Token waste usually starts when teams scatter prompts, model names, and agent settings across files.

npm init -y
npm install @autogenai/autogen openai dotenv
npm install -D typescript ts-node @types/node

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "NodeNext",
    "moduleResolution": "NodeNext",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true
  }
}

•Load your API key and create a small shared model config. For beginners, the biggest win is using one cheaper model for routine work instead of letting every agent default to a larger one.

import 'dotenv/config';
import { OpenAIChatCompletionClient } from '@autogenai/autogen';

const client = new OpenAIChatCompletionClient({
  model: 'gpt-4o-mini',
  apiKey: process.env.OPENAI_API_KEY,
});

if (!process.env.OPENAI_API_KEY) {
  throw new Error('OPENAI_API_KEY is missing');
}

•Create agents with short system messages and narrow responsibilities. Long system prompts are expensive because they get sent repeatedly, so keep them specific and reusable.

import { AssistantAgent, UserProxyAgent } from '@autogenai/autogen';

const analyst = new AssistantAgent({
  name: 'analyst',
  modelClient: client,
  systemMessage: 'Summarize customer support tickets in 3 bullets. Keep output concise.',
});

const user = new UserProxyAgent({
  name: 'user',
});

await user.init();

•Limit conversation growth by truncating history before each run. Beginners often let every turn accumulate forever, which inflates tokens fast even when the task is simple.

type ChatMessage = {
  role: 'system' | 'user' | 'assistant';
  content: string;
};

function keepLastMessages(messages: ChatMessage[], maxMessages = 6): ChatMessage[] {
  return messages.slice(-maxMessages);
}

const history: ChatMessage[] = [
  { role: 'system', content: 'Summarize support tickets.' },
  { role: 'user', content: 'Ticket 1: login fails on Safari.' },
  { role: 'assistant', content: 'Login issue on Safari.' },
];

•Use compact input formatting and ask for compact output. If you send raw paragraphs, logs, or JSON blobs without trimming them first, you pay for every extra character.

async function summarizeTicket(ticketText: string) {
  const compactTicket = ticketText
    .replace(/\s+/g, ' ')
    .trim()
    .slice(0, 800);

  const result = await analyst.generateReply([
    { role: 'user', content: `Summarize this ticket in bullets:\n${compactTicket}` },
  ]);

  return result.content;
}

const summary = await summarizeTicket(
  'Customer says login fails on Safari after password reset. Error shown is invalid session token.'
);

console.log(summary);

•Put the pieces together in a single flow that reuses the same client and keeps message count low. This pattern is easy to extend later with caching, retrieval, or routing.

async function run() {
  const messages = keepLastMessages(history, 4);

  const reply = await analyst.generateReply([
    ...messages,
    {
      role: 'user',
      content:
        'Rewrite the last ticket summary into exactly two bullets and no extra commentary.',
    },
  ]);

  console.log(reply.content);
}

run().catch((error) => {
  console.error(error);
  process.exit(1);
});

Testing It

Run the script with a real ticket or a few sample inputs and check that the response stays short and relevant. If output gets verbose, tighten the system message first, then reduce the allowed conversation history. You should also compare token usage before and after by inspecting your provider dashboard or request logs. The goal is not just lower cost; it’s fewer irrelevant tokens sent on every turn.

Next Steps

•Add response caching for repeated summaries and repeated classifications
•Introduce retrieval so agents only see the top relevant chunks instead of full documents
•Build a router agent that sends simple tasks to cheaper models and complex tasks to stronger ones

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit