AutoGen Tutorial (TypeScript): implementing retry logic for beginners
This tutorial shows you how to add retry logic around an AutoGen TypeScript agent call, so transient failures like rate limits, timeouts, and flaky model responses don’t break your workflow. You need this any time your agent is doing real work against an LLM API and you want predictable behavior instead of one-off failures.
What You'll Need
- •Node.js 18+ installed
- •A TypeScript project with
ts-nodeor a build step - •
@autogenai/autogeninstalled - •
openaiinstalled - •An OpenAI API key set as
OPENAI_API_KEY - •Basic familiarity with AutoGen agents and messages
Install the packages:
npm install @autogenai/autogen openai
npm install -D typescript ts-node @types/node
Step-by-Step
- •First, set up a basic AutoGen agent that can talk to OpenAI. The retry logic will wrap this call, so keep the agent creation clean and isolated.
import { AssistantAgent } from "@autogenai/autogen";
import { OpenAIChatCompletionClient } from "@autogenai/autogen-ext/models/openai";
const client = new OpenAIChatCompletionClient({
model: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
});
const agent = new AssistantAgent({
name: "support_agent",
modelClient: client,
systemMessage: "You are a concise support assistant.",
});
- •Add a small retry helper with exponential backoff. This is the core pattern: try the request, catch transient errors, wait, then try again with a capped delay.
function sleep(ms: number) {
return new Promise((resolve) => setTimeout(resolve, ms));
}
async function withRetry<T>(
fn: () => Promise<T>,
maxAttempts = 3,
baseDelayMs = 500
): Promise<T> {
let lastError: unknown;
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
return await fn();
} catch (error) {
lastError = error;
if (attempt === maxAttempts) break;
const delay = baseDelayMs * Math.pow(2, attempt - 1);
await sleep(delay);
}
}
throw lastError;
}
- •Wrap the AutoGen
runcall inside the retry helper. Keep the user message construction outside so each retry uses the same input and you avoid accidental drift between attempts.
async function main() {
const userMessage = "Summarize the benefits of adding retry logic.";
const result = await withRetry(async () => {
return await agent.run(userMessage);
}, 4, 750);
console.log(result.messages.at(-1)?.content);
}
main().catch((error) => {
console.error("Agent failed after retries:", error);
});
- •Make retries smarter by only retrying transient failures. In production, you should not retry every error because bad prompts and invalid credentials will never succeed on a second attempt.
function isRetryableError(error: unknown): boolean {
const message = error instanceof Error ? error.message : String(error);
return (
message.includes("rate limit") ||
message.includes("timeout") ||
message.includes("ECONNRESET") ||
message.includes("503") ||
message.includes("502")
);
}
async function withSelectiveRetry<T>(
fn: () => Promise<T>,
maxAttempts = 3
): Promise<T> {
let lastError: unknown;
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
return await fn();
} catch (error) {
lastError = error;
if (attempt === maxAttempts || !isRetryableError(error)) break;
await sleep(500 * attempt);
}
}
throw lastError;
}
- •Use the selective version in your app entrypoint. This gives you a safer default for real systems where some failures are permanent and others are temporary.
async function runWithRetries() {
const result = await withSelectiveRetry(async () => {
return await agent.run("Give me three examples of transient API failures.");
}, 3);
const lastMessage = result.messages.at(-1);
if (lastMessage?.content) {
console.log(lastMessage.content);
return;
}
throw new Error("No assistant response returned.");
}
runWithRetries().catch((error) => {
console.error("Final failure:", error);
});
Testing It
Run the script normally first and confirm you get a valid assistant response. Then temporarily break your API key or point to an invalid model name to see how non-retryable errors fail fast.
To test actual retries, simulate a transient failure by throwing an error inside the wrapped function once before succeeding on a later attempt. You should see the delay between attempts increase, and the final output should still print when a later retry succeeds.
If you're wiring this into a service, log the attempt count and final exception type. That makes it easy to tell whether you're dealing with temporary provider instability or a bug in your own code.
Next Steps
- •Add structured logging for each retry attempt and backoff duration
- •Replace string matching with provider-specific error codes from your LLM SDK
- •Extend this pattern to tool calls and multi-agent workflows
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit