How to Fix 'token limit exceeded in production' in CrewAI (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
token-limit-exceeded-in-productioncrewaitypescript

What this error actually means

token limit exceeded in production usually means one of your CrewAI tasks is sending too much text into the model at once. In practice, it shows up when you chain long task outputs, dump entire documents into a prompt, or let agents keep too much conversation history.

In TypeScript projects, this often happens after the app works in dev with small inputs, then blows up in prod when real customer data, PDFs, emails, or logs are attached.

The Most Common Cause

The #1 cause is passing raw, untrimmed content from one agent to the next instead of summarizing or chunking it first.

This is especially common when using Crew, Agent, and Task in TypeScript and wiring task outputs directly into another task’s prompt.

Broken vs fixed pattern

Broken patternFixed pattern
Passes full document text between tasksSummarizes or chunks before passing forward
Uses output directly in the next promptUses a bounded summary field
Lets prompts grow on every stepKeeps each task input small and deterministic
// ❌ Broken: raw output gets forwarded into the next task
import { Agent, Task, Crew } from "crewai";

const researcher = new Agent({
  role: "Researcher",
  goal: "Extract insights from documents",
  backstory: "Senior analyst",
});

const writer = new Agent({
  role: "Writer",
  goal: "Draft a report",
  backstory: "Technical writer",
});

const researchTask = new Task({
  description: `Analyze this document and extract everything useful:\n\n${longPolicyText}`,
  agent: researcher,
});

const writeTask = new Task({
  description: `Write a client summary using this research:\n\n{researchTaskOutput}`,
  agent: writer,
  context: [researchTask],
});

const crew = new Crew({
  agents: [researcher, writer],
  tasks: [researchTask, writeTask],
});
// ✅ Fixed: summarize first, then pass only bounded output
import { Agent, Task, Crew } from "crewai";

const researcher = new Agent({
  role: "Researcher",
  goal: "Extract key points only",
  backstory: "Senior analyst",
});

const summarizer = new Agent({
  role: "Summarizer",
  goal: "Produce a strict 200-word summary",
  backstory: "Concise technical editor",
});

const writer = new Agent({
  role: "Writer",
  goal: "Draft a report from concise inputs",
  backstory: "Technical writer",
});

const summarizeTask = new Task({
  description:
    `Summarize the document in <=200 words. Keep only facts relevant to claims handling.\n\n${longPolicyText}`,
  agent: summarizer,
});

const writeTask = new Task({
  description:
    `Write a client summary using ONLY this summary:\n\n{summarizeTaskOutput}`,
  agent: writer,
});

const crew = new Crew({
  agents: [summarizer, writer],
  tasks: [summarizeTask, writeTask],
});

If you see errors like Error: token limit exceeded, context length exceeded, or model-specific messages like This model's maximum context length is ... tokens, start here first. In production, this is almost always caused by uncontrolled prompt growth.

Other Possible Causes

1) Memory is turned on and never trimmed

If you use conversational memory or persistent state, old messages accumulate until the prompt exceeds the model window.

// Problematic config
const crew = new Crew({
  agents,
  tasks,
  memory: true,
});

Fix by trimming memory or storing only recent turns.

const crew = new Crew({
  agents,
  tasks,
  memory: {
    enabled: true,
    maxMessages: 6,
    summarizeOlderMessages: true,
  },
});

2) You are injecting full tool results into prompts

Tool outputs from search APIs, SQL dumps, OCR, or ticket exports can be massive. If you pass them directly into an agent prompt, token usage spikes fast.

// Bad
description: `Review tool output:\n\n${JSON.stringify(toolResult)}`;
// Better
description:
  `Review these top findings only:\n\n${toolResult.items.slice(0, 5).map(x => x.title).join("\n")}`;

3) Your system prompt is too large

Teams often stuff policies, SOPs, compliance rules, and examples into every agent’s backstory or system instructions.

const agent = new Agent({
  role: "Claims Assistant",
  goal: "Handle claims safely",
  backstory: bigComplianceManualText,
});

Move large policy text to retrieval or a short policy summary.

const agent = new Agent({
  role: "Claims Assistant",
  goal: "Handle claims safely using approved policy excerpts only",
});

4) You are chaining too many tasks with verbose outputs

Each task output becomes input for the next task. Five verbose tasks can turn one document into a token avalanche.

// Risky chain
tasks: [extractTask, analyzeTask, explainTask, refineTask, finalDraftTask];

Reduce intermediate verbosity and enforce strict output formats like JSON with limited fields.

How to Debug It

  1. Log token-heavy inputs before each task

    • Print prompt length and payload size.
    • Watch for giant strings coming from PDFs, HTML pages, or JSON blobs.
  2. Inspect each task boundary

    • Check what context is carrying forward.
    • If a downstream task includes upstream raw text instead of summaries, that’s your culprit.
  3. Disable memory and rerun

    • Set memory off temporarily.
    • If the error disappears, your conversation history is growing too large.
  4. Binary search the crew

    • Remove half the tasks.
    • If the error vanishes, add them back one by one until you find the step that explodes token usage.

A simple logging helper helps:

function logSize(label: string, value: unknown) {
	const text = typeof value === "string" ? value : JSON.stringify(value);
	console.log(`${label}: ${text.length} chars`);
}

Use it on every input passed to Task.description, tool output, and any persisted message history.

Prevention

  • Keep every task input bounded.
    • Use summaries, top-N results, or extracted fields instead of raw documents.
  • Treat tool output as untrusted payload size.
    • Truncate aggressively before injecting it into prompts.
  • Enforce output contracts.
    • Prefer JSON schemas or fixed-length summaries over free-form essays.

If you’re building production workflows with CrewAI in TypeScript, assume every intermediate step will grow unless you explicitly cap it. The fix is not “use a bigger model” — it’s controlling what enters the context window at each hop.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides