How to Fix 'token limit exceeded when scaling' in CrewAI (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
token-limit-exceeded-when-scalingcrewaitypescript

When CrewAI throws token limit exceeded when scaling, it usually means one of your agents is being asked to process too much context at once. In TypeScript projects, this shows up when you scale from a single prompt to multi-step tasks, long tool outputs, or agent-to-agent handoffs.

The error is not about model quality. It’s almost always about context size, prompt construction, or an unbounded loop that keeps stuffing more text into the next call.

The Most Common Cause

The #1 cause is passing full tool output, conversation history, or large documents into every task without trimming them first.

In CrewAI TypeScript setups, this usually happens in Task, Agent, or a custom tool callback that keeps appending raw output. You’ll often see something like:

  • Error: token limit exceeded when scaling
  • BadRequestError: context length exceeded
  • CrewAIError: Failed to execute task due to token limit

Broken vs fixed pattern

Broken patternFixed pattern
Passes entire document/output into the next taskSummarizes or chunks before passing forward
Reuses full chat history across all agentsKeeps only the last relevant messages
Lets tools return huge payloadsTruncates or stores payload externally
// BROKEN
import { Agent, Task, Crew } from "crewai";

const analyst = new Agent({
  role: "Analyst",
  goal: "Analyze claims documents",
  backstory: "You are precise and thorough.",
});

const task = new Task({
  description: `
    Review this document and extract risks:
    ${hugeClaimsDocument}
  `,
  agent: analyst,
});

const crew = new Crew({
  agents: [analyst],
  tasks: [task],
});

await crew.kickoff();
// FIXED
import { Agent, Task, Crew } from "crewai";

const analyst = new Agent({
  role: "Analyst",
  goal: "Analyze claims documents",
  backstory: "You are precise and thorough.",
});

const summary = await summarizeDocument(hugeClaimsDocument); // chunk + summarize first

const task = new Task({
  description: `
    Review this summary and extract risks:
    ${summary}
  `,
  agent: analyst,
});

const crew = new Crew({
  agents: [analyst],
  tasks: [task],
});

await crew.kickoff();

If you’re dealing with long documents, do not feed raw PDFs or full OCR text directly into the agent. Chunk first, summarize each chunk, then pass only the distilled result.

Other Possible Causes

1. Tool output is too large

A tool that returns a full database dump, HTML page, or JSON blob will blow up context fast.

// BAD
return JSON.stringify(records); // thousands of rows

// BETTER
return JSON.stringify(records.slice(0, 20));

If the agent needs all records, store them in S3, Redis, Postgres, or a file and return a pointer instead.

2. Recursive delegation or retry loops

A manager agent can keep delegating the same task and re-injecting prior outputs until the prompt explodes.

// BAD
const manager = new Agent({
  role: "Manager",
  goal: "Delegate until done",
});

// Fix by limiting retries / delegation depth in your orchestration layer
let attempts = 0;
while (attempts < 3) {
  attempts++;
}

Watch for repeated task descriptions growing on each iteration. That’s a sign your orchestration code is appending instead of replacing context.

3. Context window settings are too aggressive

Some TypeScript wrappers let you set model parameters directly. If you’re using a smaller model with a short context window, scaling will fail sooner than expected.

const llmConfig = {
  model: "gpt-4o-mini",
  maxTokens: 4000,
};

If your prompts are large, move to a model with a bigger context window or reduce input size before calling the model.

4. Memory is storing everything

If you enabled memory and it retains every message verbatim, scaling across many tasks can accumulate too much state.

// BAD
const crew = new Crew({
  agents,
  tasks,
  memory: true,
});

Use memory selectively. Keep summaries instead of raw transcripts, and prune old turns before each run.

How to Debug It

  1. Log prompt size before every LLM call

    • Print character count and approximate token count for each task description.
    • If one task suddenly jumps from a few thousand chars to tens of thousands, you found the source.
  2. Inspect tool outputs

    • Add logging inside every custom tool.
    • Look for huge arrays, raw HTML, PDFs converted to text, or unbounded JSON serialization.
  3. Disable memory and delegation temporarily

    • Run the same workflow with memory: false and no recursive handoffs.
    • If the error disappears, the issue is accumulation across steps rather than one bad prompt.
  4. Binary search the workflow

    • Remove half the tasks/tools and rerun.
    • Keep cutting until you isolate the exact step that causes context growth.

A simple helper makes this easier:

function logSize(label: string, value: string) {
  const chars = value.length;
  const approxTokens = Math.ceil(chars / 4);
  console.log(`${label}: ${chars} chars ~ ${approxTokens} tokens`);
}

Prevention

  • Summarize before handoff

    • Never pass raw documents between agents unless they’re small.
    • Use chunking for PDFs, emails, call transcripts, and claim notes.
  • Cap tool output

    • Return top N results only.
    • Store full payloads outside the prompt and pass references instead.
  • Keep prompts stable

    • Don’t append previous outputs into task descriptions on every retry.
    • Build prompts from fresh inputs plus a compact state object.

If you’re seeing token limit exceeded when scaling in CrewAI TypeScript, treat it as a data-shaping problem first. In production systems, the fix is usually not “increase tokens” — it’s “stop sending garbage into the model.”


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides