CrewAI Tutorial (TypeScript): implementing retry logic for beginners

By Cyprian AaronsUpdated 2026-04-21
crewaiimplementing-retry-logic-for-beginnerstypescript

This tutorial shows you how to add retry logic to a CrewAI workflow in TypeScript so transient failures don’t break your agent run. You need this when an LLM call, tool call, or downstream API occasionally fails and you want the agent to try again before giving up.

What You'll Need

  • Node.js 18+ installed
  • A TypeScript project with ts-node or a build step already set up
  • crewai installed in your project
  • An LLM provider API key, such as:
    • OPENAI_API_KEY
    • or another provider supported by your CrewAI setup
  • A basic understanding of:
    • Agent
    • Task
    • Crew
  • A terminal where you can run environment variables

Install the package if you haven’t already:

npm install crewai

Step-by-Step

  1. First, create a small project file that defines an agent and task. The retry logic will wrap the crew execution, not the CrewAI internals, which is the cleanest way to handle transient failures in production.
import { Agent, Task, Crew } from "crewai";

const writer = new Agent({
  role: "Senior Writer",
  goal: "Write a short summary about retry logic",
  backstory: "You write concise technical explanations for developers.",
});

const task = new Task({
  description: "Explain why retry logic matters in agent workflows.",
  expectedOutput: "A short explanation with practical examples.",
  agent: writer,
});

const crew = new Crew({
  agents: [writer],
  tasks: [task],
});
  1. Next, add a reusable retry helper. This version retries on any thrown error, waits between attempts, and uses exponential backoff so you don’t hammer the API if it’s temporarily unstable.
async function withRetry<T>(
  fn: () => Promise<T>,
  maxAttempts = 3,
  baseDelayMs = 1000
): Promise<T> {
  let lastError: unknown;

  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error;
      if (attempt === maxAttempts) break;

      const delay = baseDelayMs * Math.pow(2, attempt - 1);
      console.log(`Attempt ${attempt} failed. Retrying in ${delay}ms...`);
      await new Promise((resolve) => setTimeout(resolve, delay));
    }
  }

  throw lastError;
}
  1. Now wrap the crew execution inside that helper. This is where the retry behavior actually kicks in, and it works whether the failure comes from model inference, rate limits, or a temporary network issue.
async function main() {
  const result = await withRetry(async () => {
    return await crew.kickoff();
  }, 3, 1500);

  console.log("Crew output:");
  console.log(result);
}

main().catch((error) => {
  console.error("Final failure after retries:");
  console.error(error);
});
  1. If you want more control, filter which errors should be retried. In production, you usually retry only transient failures like timeouts or rate limits, not validation errors or bad prompts.
function isRetryableError(error: unknown): boolean {
  if (!(error instanceof Error)) return false;

  const message = error.message.toLowerCase();
  return (
    message.includes("timeout") ||
    message.includes("rate limit") ||
    message.includes("temporarily unavailable") ||
    message.includes("network")
  );
}

async function withSelectiveRetry<T>(
  fn: () => Promise<T>,
  maxAttempts = 3
): Promise<T> {
  let lastError: unknown;

  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error;
      if (!isRetryableError(error) || attempt === maxAttempts) break;
    }
  }

  throw lastError;
}
  1. Finally, wire in logging so you can see what happened during each retry. In real systems, this is how you debug flaky agent runs without guessing whether the issue was the model provider or your own code.
async function mainWithLogging() {
  try {
    const result = await withRetry(async () => {
      console.log("Running crew...");
      return await crew.kickoff();
    }, 3);

    console.log("Success:");
    console.log(result);
  } catch (error) {
    console.error("Crew failed after all retries.");
    console.error(error);
    process.exitCode = 1;
  }
}

mainWithLogging();

Testing It

Run the script once with a valid API key and confirm that the crew completes successfully on the first try. Then simulate a failure by temporarily disconnecting your network or using an invalid endpoint so you can see the retry messages print before the final error.

If you use selective retries, test both retryable and non-retryable errors. For example, a timeout should trigger another attempt, while a malformed configuration should fail immediately.

Watch for two things in the output:

  • The delay increases between attempts
  • The script stops after maxAttempts

Next Steps

  • Add retry logic around individual tool calls instead of only wrapping crew.kickoff()
  • Store retry metadata in logs so you can trace flaky runs across environments
  • Add circuit breaker behavior for services that keep failing repeatedly

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides