CrewAI Tutorial (TypeScript): implementing retry logic for advanced developers

By Cyprian AaronsUpdated 2026-04-21
crewaiimplementing-retry-logic-for-advanced-developerstypescript

This tutorial shows you how to add retry logic to a CrewAI TypeScript workflow without turning your agents into brittle one-shot scripts. You’ll build a small pattern that retries failed agent calls with backoff, logs the failure reason, and stops cleanly when the error is permanent.

What You'll Need

  • Node.js 18+ and npm
  • A TypeScript project with ts-node or tsx
  • CrewAI TypeScript SDK installed
  • OpenAI API key set in your environment
  • Basic familiarity with Agent, Task, and Crew
  • A terminal where you can run TypeScript directly

Install the packages:

npm install @crewai/crewai openai
npm install -D typescript tsx @types/node

Set your environment variable:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

  1. Start by creating a small retry wrapper instead of scattering retry logic across every task. This keeps the CrewAI code clean and lets you classify failures in one place.
export type RetryOptions = {
  maxAttempts: number;
  baseDelayMs: number;
};

export async function withRetry<T>(
  fn: () => Promise<T>,
  options: RetryOptions
): Promise<T> {
  let lastError: unknown;

  for (let attempt = 1; attempt <= options.maxAttempts; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error;
      if (attempt === options.maxAttempts) break;

      const delay = options.baseDelayMs * Math.pow(2, attempt - 1);
      await new Promise((resolve) => setTimeout(resolve, delay));
    }
  }

  throw lastError;
}
  1. Define your agent and task normally. The important part is that the task execution will be wrapped by the retry helper, not the CrewAI object itself.
import { Agent, Task, Crew, Process } from "@crewai/crewai";

export const analyst = new Agent({
  role: "Research Analyst",
  goal: "Summarize insurance claim trends from provided data",
  backstory: "You are precise and concise.",
});

export const task = new Task({
  description:
    "Analyze the following incident summary and produce a short operational risk note.",
  expectedOutput: "A concise risk note with one recommendation.",
  agent: analyst,
});

export const crew = new Crew({
  agents: [analyst],
  tasks: [task],
  process: Process.sequential,
});
  1. Wrap the crew execution in a retry-aware runner. In production, this is where you would inspect error messages and only retry transient failures like timeouts or rate limits.
import { crew } from "./crew";
import { withRetry } from "./retry";

function isRetryable(error: unknown): boolean {
  if (!(error instanceof Error)) return false;
  const message = error.message.toLowerCase();

  return (
    message.includes("rate limit") ||
    message.includes("timeout") ||
    message.includes("temporarily unavailable") ||
    message.includes("503")
  );
}

async function runCrewWithRetry() {
  return withRetry(
    async () => {
      try {
        return await crew.kickoff();
      } catch (error) {
        if (!isRetryable(error)) throw error;
        throw error;
      }
    },
    { maxAttempts: 4, baseDelayMs: 500 }
  );
}
  1. Add a proper entry point so failures are visible and process exit codes stay correct. This matters when your crew runs inside CI, cron jobs, or an orchestrator.
async function main() {
  try {
    const result = await runCrewWithRetry();
    console.log(String(result));
    process.exitCode = 0;
  } catch (error) {
    console.error("Crew execution failed:", error);
    process.exitCode = 1;
  }
}

main();
  1. If you want better control, move retry handling around specific tool calls instead of the whole crew. That gives you finer-grained recovery when one external dependency fails but the rest of the workflow can continue.
type ToolResult = {
  status: "ok" | "failed";
  value?: string;
};

async function fetchExternalData(): Promise<string> {
  return withRetry(
    async () => {
      const response = await fetch("https://example.com/data");
      if (!response.ok) throw new Error(`HTTP ${response.status}`);
      return await response.text();
    },
    { maxAttempts: 3, baseDelayMs: 300 }
  );
}

async function getDataSafely(): Promise<ToolResult> {
  try {
    const value = await fetchExternalData();
    return { status: "ok", value };
  } catch (error) {
    return { status: "failed" };
  }
}

Testing It

Run the script against a known flaky dependency or temporarily force an error in fetchExternalData() to confirm retries happen before failure. You should see the delay increase on each attempt because of exponential backoff.

Then test a non-retryable failure like a bad API key or malformed prompt path. That should fail immediately once your isRetryable() guard rejects it.

Finally, verify exit codes in automation by running the script in CI or locally with echo $? after execution. A successful run should exit 0, and an exhausted retry path should exit 1.

Next Steps

  • Add jitter to backoff so multiple workers don’t retry at the same time.
  • Classify errors by provider status codes and response headers instead of message text.
  • Push retry metrics into OpenTelemetry or your logging stack so you can track failure rates per agent and per tool

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides