How to Fix 'rate limit exceeded when scaling' in CrewAI (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
rate-limit-exceeded-when-scalingcrewaitypescript

When CrewAI throws rate limit exceeded when scaling, it usually means your agent workload increased faster than the upstream model/provider quota can handle. In practice, this shows up when you add more agents, more parallel tasks, or recursive retries without controlling concurrency.

The key thing to understand: this is rarely a CrewAI bug. It’s usually a throughput problem between your TypeScript app and the LLM API behind it.

The Most Common Cause

The #1 cause is uncontrolled parallelism. In TypeScript, people often fan out tasks with Promise.all() and assume CrewAI will naturally pace itself. It won’t.

Here’s the broken pattern:

import { Crew, Agent, Task } from "crewai";

const researcher = new Agent({
  role: "Researcher",
  goal: "Gather facts",
  backstory: "You research quickly and accurately",
  llm: "gpt-4o-mini",
});

const tasks = urls.map((url) =>
  new Task({
    description: `Summarize ${url}`,
    agent: researcher,
  })
);

// Broken: all tasks hit the model at once
const crew = new Crew({
  agents: [researcher],
  tasks,
});

const results = await Promise.all(
  tasks.map(() => crew.run())
);

And here’s the fixed pattern:

import pLimit from "p-limit";
import { Crew, Agent, Task } from "crewai";

const limit = pLimit(2); // keep concurrency under provider limits

const researcher = new Agent({
  role: "Researcher",
  goal: "Gather facts",
  backstory: "You research quickly and accurately",
  llm: "gpt-4o-mini",
});

async function runOne(url: string) {
  const task = new Task({
    description: `Summarize ${url}`,
    agent: researcher,
  });

  const crew = new Crew({
    agents: [researcher],
    tasks: [task],
  });

  return crew.run();
}

const results = await Promise.all(
  urls.map((url) => limit(() => runOne(url)))
);

The difference is simple:

  • Broken code creates a burst of requests
  • Fixed code caps concurrency and smooths out request volume

If you’re scaling across multiple workers or queue consumers, this same issue appears even if each worker looks “safe” on its own.

Other Possible Causes

1. Retry storms from failed tool calls

If a tool call fails and your wrapper retries immediately, you can multiply traffic fast.

// Bad: aggressive retry loop with no backoff
for (let i = 0; i < 5; i++) {
  try {
    await crew.run();
    break;
  } catch (err) {
    continue;
  }
}

Use exponential backoff and stop retrying on hard rate-limit errors like:

  • 429 Too Many Requests
  • RateLimitError
  • OpenAIError: Rate limit exceeded
await retry(async () => crew.run(), {
  retries: 3,
  minTimeout: 1000,
  factor: 2,
});

2. Multiple agents sharing one provider quota

CrewAI agents are separate logical workers, but they often share the same underlying API key. If three agents all use gpt-4o, you’re still hitting one quota bucket.

const analyst = new Agent({ llm: "gpt-4o" });
const writer = new Agent({ llm: "gpt-4o" });
const reviewer = new Agent({ llm: "gpt-4o" });

If that’s your setup, reduce parallel agent execution or split traffic across keys/models where allowed.

3. Recursive delegation or task loops

A crew that keeps delegating work can generate more calls than expected. This happens when an agent repeatedly asks another agent to re-check output.

Watch for patterns like:

// Pseudocode pattern that causes runaway calls
while (!done) {
  await crew.run();
}

Or overly broad task instructions that trigger repeated self-correction. Add hard stop conditions and cap iteration counts.

4. Context too large, causing expensive re-tries

Large prompts don’t directly cause rate limits, but they increase latency and error probability. That leads to retries, which then trigger the actual limit error.

Trim context aggressively:

const task = new Task({
  description: summary.slice(0, 2000),
  agent,
});

Also avoid passing entire chat histories into every task unless you need them.

How to Debug It

  1. Check whether the spike is concurrency-related

    • Log how many crew.run() calls happen at once.
    • If the error appears only under load, this is almost always the culprit.
  2. Inspect the exact upstream error

    • Don’t stop at rate limit exceeded when scaling.
    • Look for provider errors such as:
      • 429 Too Many Requests
      • insufficient_quota
      • RateLimitError
      • openai.RateLimitError
  3. Measure request timing

    • Add timestamps around every crew execution.
    • If requests cluster in bursts instead of spreading out, throttle them.
console.log("start", Date.now());
await crew.run();
console.log("end", Date.now());
  1. Isolate one agent and one task
    • Run a single task with one agent.
    • Then add concurrency back gradually:
      • single task
      • multiple tasks sequentially
      • limited parallelism
      • full load

That tells you whether the issue is in your code path or in provider capacity.

Prevention

  • Cap concurrency by default with p-limit, a queue, or worker pool settings.
  • Add exponential backoff for 429 responses instead of blind retries.
  • Keep agent prompts tight so you don’t inflate token usage and trigger retry cascades.
  • Load test with production-like traffic before turning on full fan-out.

If you’re using CrewAI in TypeScript at scale, treat LLM calls like any other constrained external dependency. Control concurrency first, then tune retries, then optimize prompt size. That order fixes most “rate limit exceeded when scaling” failures without guesswork.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides