AutoGen Tutorial (TypeScript): rate limiting API calls for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
autogenrate-limiting-api-calls-for-intermediate-developerstypescript

This tutorial shows you how to put a hard rate limit around AutoGen API calls in TypeScript so your agent doesn’t blow through provider quotas or trigger 429s. You’ll build a small wrapper that queues LLM requests, spaces them out, and keeps your AutoGen workflow stable under load.

What You'll Need

  • Node.js 18+ installed
  • A TypeScript project with ts-node or a build step
  • autogen-agentchat installed
  • @microsoft/autogen-core installed
  • An OpenAI API key in OPENAI_API_KEY
  • Basic familiarity with AutoGen agents and model clients

Install the packages:

npm install autogen-agentchat @microsoft/autogen-core
npm install -D typescript ts-node @types/node

Step-by-Step

  1. Start by creating a simple rate limiter. This version uses a minimum delay between calls, which is enough for many provider quotas and easy to reason about in production.
export class RateLimiter {
  private lastRun = 0;
  private queue: Promise<void> = Promise.resolve();

  constructor(private readonly minIntervalMs: number) {}

  async schedule<T>(fn: () => Promise<T>): Promise<T> {
    const run = this.queue.then(async () => {
      const now = Date.now();
      const waitMs = Math.max(0, this.minIntervalMs - (now - this.lastRun));
      if (waitMs > 0) await new Promise((r) => setTimeout(r, waitMs));
      this.lastRun = Date.now();
      return fn();
    });

    this.queue = run.then(() => undefined, () => undefined);
    return run;
  }
}
  1. Next, wire the limiter into an AutoGen model client. The important part is that every LLM call goes through schedule, so your agent traffic gets serialized and delayed consistently.
import { OpenAIChatCompletionClient } from "@microsoft/autogen-core";
import { RateLimiter } from "./RateLimiter";

const limiter = new RateLimiter(1200);

export const client = new OpenAIChatCompletionClient({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

export async function limitedCreate(messages: Array<{ role: string; content: string }>) {
  return limiter.schedule(() =>
    client.create({
      messages,
      temperature: 0,
    })
  );
}
  1. Build an agent that uses the wrapped client instead of calling the model directly. This keeps the rest of your AutoGen code unchanged while enforcing the limit at the boundary.
import { AssistantAgent } from "autogen-agentchat";
import { limitedCreate } from "./limitedCreate";

const agent = new AssistantAgent({
  name: "support_agent",
  modelClient: {
    create: limitedCreate,
  },
});

async function main() {
  const result = await agent.run({
    task: "Summarize why rate limiting matters for API integrations.",
  });

  console.log(result.messages.at(-1)?.content);
}

main().catch(console.error);
  1. Add retry handling for transient rate-limit errors. In practice, you want both pacing and retries because providers can still reject bursts caused by other services or shared keys.
function sleep(ms: number) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

async function withRetry<T>(fn: () => Promise<T>, attempts = 3): Promise<T> {
  let lastError: unknown;

  for (let i = 0; i < attempts; i++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error;
      await sleep(500 * Math.pow(2, i));
    }
  }

  throw lastError;
}
  1. Combine both pieces so all requests are paced and retried safely. This is the version you actually want in a service handling real traffic.
import { RateLimiter } from "./RateLimiter";
import { client } from "./limitedCreate";

const limiter = new RateLimiter(1200);

export async function safeCreate(messages: Array<{ role: string; content: string }>) {
  return withRetry(() =>
    limiter.schedule(() =>
      client.create({
        messages,
        temperature: 0,
      })
    )
  );
}

Testing It

Run three or four requests back-to-back and watch the timestamps in your logs. You should see each call spaced by roughly your configured interval instead of firing all at once.

If you want a quick check, log before and after safeCreate and confirm that concurrent tasks are queued rather than executed in parallel. Also test with a low interval like 200ms first so you can see the behavior without waiting too long.

Finally, force a failure by temporarily using an invalid API key or an aggressive request burst. Your retry path should kick in, but your limiter should still keep calls serialized.

Next Steps

  • Replace the fixed delay with a token bucket if you need per-minute throughput control instead of simple spacing.
  • Add per-model or per-tenant limits if multiple agents share the same API key.
  • Persist request metrics so you can alert on queue depth, retry counts, and provider-side throttling.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides