AutoGen Tutorial (TypeScript): rate limiting API calls for beginners

By Cyprian AaronsUpdated 2026-04-21

autogenrate-limiting-api-calls-for-beginnerstypescript

This tutorial shows you how to add rate limiting around AutoGen API calls in TypeScript so your agent stops hammering an upstream model or tool endpoint. You need this when you want to avoid 429s, control spend, and keep one noisy conversation from starving the rest of your system.

What You'll Need

•Node.js 18+ installed
•A TypeScript project with ts-node or a build step
•
AutoGen packages:
- •@autogen/core
- •@autogen/openai
•An OpenAI API key set as OPENAI_API_KEY
•
Optional, but useful for real systems:
- •Redis for distributed rate limiting
- •A logging library like pino

Step-by-Step

•Install the packages and set up a basic TypeScript project.
We’ll use AutoGen’s OpenAI model client, then wrap calls with a simple token bucket limiter before the request is sent.

npm init -y
npm install @autogen/core @autogen/openai
npm install -D typescript ts-node @types/node

•Create a small rate limiter you can reuse anywhere in your app.
This version is in-memory, which is fine for a single process or local development.

// rateLimiter.ts
export class TokenBucket {
  private tokens: number;
  private lastRefill: number;

  constructor(
    private readonly capacity: number,
    private readonly refillPerSecond: number
  ) {
    this.tokens = capacity;
    this.lastRefill = Date.now();
  }

  async acquire(): Promise<void> {
    while (true) {
      this.refill();
      if (this.tokens >= 1) {
        this.tokens -= 1;
        return;
      }
      await new Promise((r) => setTimeout(r, 100));
    }
  }

  private refill() {
    const now = Date.now();
    const elapsedSeconds = (now - this.lastRefill) / 1000;
    const refillAmount = elapsedSeconds * this.refillPerSecond;
    if (refillAmount >= 1) {
      this.tokens = Math.min(this.capacity, this.tokens + refillAmount);
      this.lastRefill = now;
    }
  }
}

•Build an AutoGen model client and wrap each call with the limiter.
The important part is that the limiter sits at the boundary where requests leave your code, not inside your prompt logic.

// main.ts
import { TokenBucket } from "./rateLimiter";
import { OpenAIChatCompletionClient } from "@autogen/openai";

const limiter = new TokenBucket(5, 1); // burst of 5, then 1 request/sec

const modelClient = new OpenAIChatCompletionClient({
  model: "gpt-4o-mini",
});

async function askModel(prompt: string): Promise<string> {
  await limiter.acquire();

  const result = await modelClient.create([
    { role: "user", content: prompt },
  ]);

  return result.content ?? "";
}

•Add retry handling for upstream rate-limit errors.
Rate limiting on your side reduces pressure, but you still need backoff because providers can throttle based on shared account usage or token volume.

async function askModelWithRetry(prompt: string): Promise<string> {
  for (let attempt = 1; attempt <= 3; attempt++) {
    try {
      return await askModel(prompt);
    } catch (error: any) {
      const status = error?.status ?? error?.response?.status;
      if (status !== 429 || attempt === 3) throw error;

      const delayMs = attempt * attempt * 1000;
      await new Promise((r) => setTimeout(r, delayMs));
    }
  }

  throw new Error("unreachable");
}

•Run several calls in parallel and watch the limiter serialize them.
This is the simplest proof that your wrapper works: even if you fire ten requests at once, only the allowed rate gets through.

async function main() {
  const prompts = Array.from({ length: 10 }, (_, i) => `Say hello ${i + 1}`);

  const results = await Promise.all(
    prompts.map(async (prompt) => {
      const startedAt = new Date().toISOString();
      const text = await askModelWithRetry(prompt);
      return { startedAt, prompt, text };
    })
  );

  console.log(JSON.stringify(results, null, 2));
}

main().catch((error) => {
  console.error(error);
  process.exit(1);
});

Testing It

Run the script and confirm that all ten requests do not hit the model at once. If you add a few console.log statements inside acquire(), you should see requests waiting when the bucket is empty and resuming after refills.

To test failure handling, temporarily lower your provider quota or increase concurrency by sending more prompts. You want to see retries on 429 without crashing the whole batch.

For production-style validation, measure two things:

•average wait time before a request is sent
•number of upstream 429 responses over time

If those numbers stay low while throughput remains acceptable, your limit settings are sane.

Next Steps

•Move the token bucket to Redis so multiple Node processes share one global limit.
•Add separate limits per model name, tenant ID, or agent workflow.
•Track request cost as well as request count so long prompts don’t blow up token budgets.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit