CrewAI Tutorial (TypeScript): implementing retry logic for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
crewaiimplementing-retry-logic-for-intermediate-developerstypescript

This tutorial shows you how to add retry logic around CrewAI task execution in TypeScript so transient failures don’t break your workflow. You need this when an agent call fails due to rate limits, flaky tools, or temporary API errors and you want controlled retries instead of a hard stop.

What You'll Need

  • Node.js 18+
  • A TypeScript project with tsconfig.json
  • crewai installed
  • An LLM provider API key set in your environment, for example:
    • OPENAI_API_KEY
  • A .env file loader such as dotenv
  • Basic familiarity with CrewAI concepts:
    • agents
    • tasks
    • crews

Step-by-Step

  1. Start with a minimal CrewAI setup that can fail in a realistic way.
    The retry wrapper will sit outside CrewAI, so keep the crew definition clean and deterministic.
import "dotenv/config";
import { Agent, Task, Crew, Process } from "crewai";

const analyst = new Agent({
  role: "Claims Analyst",
  goal: "Summarize claim risk clearly",
  backstory: "You review insurance claims and flag missing details.",
});

const task = new Task({
  description: "Summarize the top risks in this claim file.",
  expectedOutput: "A concise risk summary with actionable points.",
  agent: analyst,
});

const crew = new Crew({
  agents: [analyst],
  tasks: [task],
  process: Process.sequential,
});

async function runCrew() {
  const result = await crew.kickoff();
  console.log(result);
}

runCrew();
  1. Add a reusable retry helper with exponential backoff and jitter.
    This is the part that makes the workflow production-friendly: retry only on transient errors, and stop after a bounded number of attempts.
type RetryOptions = {
  retries: number;
  baseDelayMs?: number;
};

function sleep(ms: number) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

function isRetryableError(error: unknown): boolean {
  const message = error instanceof Error ? error.message : String(error);
  return /rate limit|timeout|temporar|429|ECONNRESET|ETIMEDOUT/i.test(message);
}

async function withRetry<T>(
  fn: () => Promise<T>,
  options: RetryOptions
): Promise<T> {
  const baseDelayMs = options.baseDelayMs ?? 500;

  for (let attempt = 1; attempt <= options.retries + 1; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (attempt > options.retries || !isRetryableError(error)) throw error;

      const delay = baseDelayMs * Math.pow(2, attempt - 1);
      const jitter = Math.floor(Math.random() * 100);
      await sleep(delay + jitter);
    }
  }

  throw new Error("Retry loop exited unexpectedly");
}
  1. Wrap crew.kickoff() with the retry helper.
    Keep the retry policy at the orchestration layer so your agents remain reusable across jobs with different reliability requirements.
import "dotenv/config";
import { Agent, Task, Crew, Process } from "crewai";

const analyst = new Agent({
  role: "Claims Analyst",
  goal: "Summarize claim risk clearly",
});

const task = new Task({
  description: "Summarize the top risks in this claim file.",
  expectedOutput: "A concise risk summary with actionable points.",
  agent: analyst,
});

const crew = new Crew({
  agents: [analyst],
  tasks: [task],
  process: Process.sequential,
});

type RetryOptions = { retries: number; baseDelayMs?: number };

function sleep(ms: number) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

function isRetryableError(error: unknown): boolean {
const message = error instanceof Error ? error.message : String(error);
return /rate limit|timeout|temporar|429|ECONNRESET|ETIMEDOUT/i.test(message);
}

async function withRetry<T>(fn: () => Promise<T>, options: RetryOptions) {
  const baseDelayMs = options.baseDelayMs ?? 500;

for (let attempt = 1; attempt <= options.retries + 1; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (attempt > options.retries || !isRetryableError(error)) throw error;
      const delay = baseDelayMs * Math.pow(2, attempt - 1) + Math.floor(Math.random() * 100);
      await sleep(delay);
    }
}
throw new Error("Retry loop exited unexpectedly");
}

async function runCrew() {
const result = await withRetry(() => crew.kickoff(), { retries: 3 });
console.log(result);
}

runCrew();
  1. Make the retry behavior observable so you can debug failures in production.
    Logging each attempt matters because without it you’ll have no idea whether you’re seeing one bad request or a repeated upstream outage.
async function withRetry<T>(
fn: () => Promise<T>,
options: { retries: number; baseDelayMs?: number }
): Promise<T> {
const baseDelayMs = options.baseDelayMs ?? 500;

for (let attempt = 1; attempt <= options.retries + 1; attempt++) {
    try {
      console.log(`Attempt ${attempt} of ${options.retries + 1}`);
      return await fn();
    } catch (error) {
      const message = error instanceof Error ? error.message : String(error);
      console.error(`Attempt ${attempt} failed: ${message}`);

      if (attempt > options.retries || !isRetryableError(error)) throw error;

      const delay = baseDelayMs * Math.pow(2, attempt - 1) + Math.floor(Math.random() * 100);
      console.log(`Retrying in ${delay}ms`);
      await sleep(delay);
    }
}
throw new Error("Retry loop exited unexpectedly");
}
  1. If your workflow has multiple tasks, wrap each step independently instead of retrying the whole crew blindly.
    That gives you finer control when one task is flaky but earlier tasks already succeeded.
import "dotenv/config";
import { Agent, Task, Crew, Process } from "crewai";

const writer = new Agent({ role: "Writer", goal: "Draft concise outputs" });
const reviewer = new Agent({ role: "Reviewer", goal: "Check for missing details" });

const draftTask = new Task({
description: "Draft a short claim summary.",
expectedOutput: "A draft summary.",
agent: writer,
});

const reviewTask = new Task({
description:
"Review the draft and point out gaps.",
expectedOutput:
"A review note with corrections.",
agent: reviewer,
});

const crew = new Crew({
agents: [writer, reviewer],
tasks:[draftTask, reviewTask],
process: Process.sequential,
});

Testing It

Run the script normally first and confirm you get a successful completion from crew.kickoff(). Then force a transient failure by temporarily using an invalid network condition or by mocking an error that matches your retry filter, such as "429 rate limit".

Watch the logs for multiple attempts and increasing delays between them. If a non-retryable error occurs, it should fail immediately without exhausting all retries.

A good test is to compare behavior under two cases:

  • transient errors are retried
  • validation or coding errors fail fast

If you want stronger coverage, write unit tests around isRetryableError() and withRetry() separately from CrewAI itself.

Next Steps

  • Add per-error-class policies, such as different retry counts for rate limits vs timeouts
  • Persist failed runs to a queue so they can be replayed later
  • Combine retries with circuit breakers and idempotency keys for bank-grade workflows

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides