How to Fix 'rate limit exceeded during development' in CrewAI (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

rate-limit-exceeded-during-developmentcrewaitypescript

When CrewAI throws rate limit exceeded during development, it usually means your app is making too many LLM requests in a short window. In TypeScript projects, this often shows up during local testing because the same agent or task gets executed repeatedly in a loop, on every file change, or inside a request handler that is being hit more than you think.

The fix is usually not “increase the limit” first. It’s to find where your code is accidentally multiplying calls.

The Most Common Cause

The #1 cause is re-creating and re-running agents/tasks on every invocation instead of reusing them, especially inside an HTTP route, watcher, or React-style render cycle. With CrewAI TypeScript, that means you may be instantiating new Crew() and calling crew.kickoff() far more often than intended.

Here’s the broken pattern:

Broken	Fixed
Creates a new crew per request	Reuses a single crew instance
No guard against duplicate execution	Adds explicit execution boundary
Easy to trigger repeated LLM calls	Predictable call count

// broken.ts
import { Crew } from "crewai";
import { agent } from "./agent";
import { task } from "./task";

export async function handleRequest(req: Request) {
  const crew = new Crew({
    agents: [agent()],
    tasks: [task()],
  });

  // This can run many times during development:
  const result = await crew.kickoff({
    input: await req.json(),
  });

  return Response.json(result);
}

// fixed.ts
import { Crew } from "crewai";
import { agent } from "./agent";
import { task } from "./task";

const crew = new Crew({
  agents: [agent()],
  tasks: [task()],
});

export async function handleRequest(req: Request) {
  const body = await req.json();

  // One explicit kickoff per request
  const result = await crew.kickoff({
    input: body,
  });

  return Response.json(result);
}

If you’re using a dev server with hot reload, this matters even more. Module reloads can re-run initialization code and create duplicate crews, duplicate tasks, or duplicate event handlers.

Other Possible Causes

1) A retry loop without backoff

If your TypeScript wrapper retries immediately on failure, you can hit the limit fast.

// bad
for (let i = 0; i < 5; i++) {
  try {
    return await crew.kickoff({ input });
  } catch (err) {
    continue;
  }
}

Use exponential backoff and stop on rate-limit errors:

// better
for (let i = 0; i < 5; i++) {
  try {
    return await crew.kickoff({ input });
  } catch (err: any) {
    if (!String(err?.message).includes("rate limit exceeded")) throw err;
    await new Promise((r) => setTimeout(r, 2 ** i * 500));
  }
}

2) Multiple agents calling the same model at once

CrewAI can fan out requests if you have parallel work patterns or multiple tasks firing together. If all of them use the same provider key, the provider may reject bursts.

const crew = new Crew({
  agents: [researchAgent(), writerAgent(), reviewerAgent()],
  tasks: [researchTask(), writeTask(), reviewTask()],
});

Throttle concurrency if your workflow supports it, or split large crews into smaller sequential steps.

3) Verbose tracing causing extra calls in dev

Some setups attach callbacks, telemetry, or debug hooks that accidentally invoke extra completions. If your logs show more model calls than expected, inspect your callback chain.

const crew = new Crew({
  agents: [agent()],
  tasks: [task()],
  callbacks: [
    // Make sure these are logging-only
    debugCallback,
    tracingCallback,
  ],
});

If a callback performs summarization or classification with an LLM, that is another hidden API call.

4) Bad environment config pointing dev and prod to the same key

This one is common in monorepos. Your local .env may be loading the same API key used by staging or another developer.

# .env.local
OPENAI_API_KEY=sk-prod-shared-key

Use separate keys for local development:

# .env.local
OPENAI_API_KEY=sk-dev-local-key
CREWAI_LOG_LEVEL=debug

Also check whether your provider has per-model limits. A key can be valid but still rate-limited for the specific model you selected.

How to Debug It

•
Count actual model calls
- •Add logging around every crew.kickoff() and every task execution.
- •If one user action produces multiple calls, you have duplication somewhere.
•
Check for hot-reload duplication
- •Move crew initialization outside request handlers.
- •If the error disappears after that change, module reloads were likely creating extra instances.
•
Inspect retries and loops
- •Search for while, for, .map(async ...), and custom retry wrappers.
- •Look for code that keeps calling kickoff() after failures without delay.
•
Verify provider limits and environment
- •Confirm which API key is loaded at runtime.
- •Check whether the provider returns a real rate-limit response like 429 Too Many Requests or messages such as rate limit exceeded.

Prevention

•Create one crew instance per workflow boundary, not per render or per helper function.
•Add backoff with jitter around retries; never hammer the provider on failure.
•Log every LLM call in development so you can see spikes before they hit production limits.
•Keep dev keys separate from shared org keys and rotate them if multiple teammates are using the same secret.

If you still see CrewAIError: rate limit exceeded during development after fixing duplicate execution, assume hidden calls first. In practice, it’s usually not CrewAI itself — it’s your code calling the model more times than you think.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit