AutoGen Tutorial (TypeScript): rate limiting API calls for advanced developers
This tutorial shows how to add a real rate limiter around AutoGen tool and model calls in TypeScript, so your agent stays inside provider quotas and doesn’t hammer downstream APIs. You need this when you’re running multi-agent workflows, calling expensive models, or integrating with bank/insurance APIs that enforce strict per-second and per-minute limits.
What You'll Need
- •Node.js 18+
- •TypeScript 5+
- •An AutoGen TypeScript project
- •
@autogenai/autogen - •
openai - •
dotenv - •An OpenAI API key in
.env - •A target API or tool function you want to protect from burst traffic
Install the packages:
npm install @autogenai/autogen openai dotenv
npm install -D typescript tsx @types/node
Step-by-Step
- •Start by defining a small rate limiter that controls how many calls can run inside a rolling time window. For production, a token bucket or Redis-backed limiter is better, but this in-memory version is enough to wire into an AutoGen agent cleanly.
export class RateLimiter {
private timestamps: number[] = [];
constructor(
private readonly maxCalls: number,
private readonly windowMs: number
) {}
async acquire(): Promise<void> {
while (true) {
const now = Date.now();
this.timestamps = this.timestamps.filter((t) => now - t < this.windowMs);
if (this.timestamps.length < this.maxCalls) {
this.timestamps.push(now);
return;
}
const waitMs = this.windowMs - (now - this.timestamps[0]);
await new Promise((resolve) => setTimeout(resolve, waitMs));
}
}
}
- •Next, wrap the actual API work in a function that waits for the limiter before making the request. This pattern keeps your business logic unchanged and makes the limiter reusable across tools, agents, and direct SDK calls.
import "dotenv/config";
import OpenAI from "openai";
import { RateLimiter } from "./rate-limiter";
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const limiter = new RateLimiter(3, 10_000);
export async function limitedChat(prompt: string): Promise<string> {
await limiter.acquire();
const response = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: prompt }],
});
return response.choices[0]?.message?.content ?? "";
}
- •Now expose that function as an AutoGen tool so the agent can call it through the same throttle. In TypeScript AutoGen, tools are just typed functions with metadata, which makes them easy to guard at the boundary.
import { tool } from "@autogenai/autogen";
import { limitedChat } from "./limited-chat";
export const summarizeTool = tool({
name: "summarize_text",
description: "Summarize long text using a rate-limited OpenAI call.",
parameters: {
type: "object",
properties: {
text: { type: "string" },
},
required: ["text"],
additionalProperties: false,
},
execute: async ({ text }: { text: string }) => {
return await limitedChat(`Summarize this text:\n\n${text}`);
},
});
- •Build an agent that uses the tool and keep the model interaction itself under control as well. If your workflow fans out across multiple turns or multiple agents, protecting both tool calls and model calls prevents quota spikes.
import "dotenv/config";
import { AssistantAgent } from "@autogenai/autogen";
import { summarizeTool } from "./summarize-tool";
const agent = new AssistantAgent({
name: "rate_limited_assistant",
modelClientOptions: {
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-4o-mini",
},
tools: [summarizeTool],
});
async function main() {
const result = await agent.run({
task:
"Use summarize_text on this paragraph and give me one sentence back: " +
"AutoGen agents often need guardrails when they call external services repeatedly.",
});
console.log(result.output);
}
main().catch(console.error);
- •If you want stronger protection for parallel jobs, add a concurrency cap on top of the rate limiter. This stops bursts of simultaneous requests from exhausting your window even if each individual request is valid.
export class ConcurrencyGate {
private active = 0;
private queue: Array<() => void> = [];
constructor(private readonly maxConcurrent: number) {}
async enter(): Promise<() => void> {
if (this.active >= this.maxConcurrent) {
await new Promise<void>((resolve) => this.queue.push(resolve));
}
this.active += 1;
return () => {
this.active -= 1;
const next = this.queue.shift();
if (next) next();
};
}
}
Testing It
Run the agent several times in quick succession and watch the timestamps between outgoing requests. With maxCalls set to 3 per 10_000 ms, the fourth request should pause until the window opens again.
If you want to verify it under load, trigger multiple limitedChat() calls with Promise.all() and confirm they serialize according to your policy instead of all hitting OpenAI at once. Also check your provider dashboard or logs for fewer 429 responses.
For a bank or insurance workflow, test against a mock downstream service first, then point it at the real API with conservative limits. That gives you confidence your retry logic and throttling behave correctly before production traffic hits it.
Next Steps
- •Replace the in-memory limiter with Redis so limits survive restarts and work across multiple instances.
- •Add exponential backoff for
429responses from OpenAI or downstream systems. - •Instrument limiter wait time with metrics so you can see when agents are approaching quota pressure.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit