How to Fix 'context length exceeded in production' in AutoGen (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

context-length-exceeded-in-productionautogentypescript

If you’re seeing context length exceeded in production in AutoGen TypeScript, the model is being asked to process more tokens than its context window allows. In practice, this usually shows up after a few turns of agent chat, tool output, or when you keep appending full conversation history to every request.

The fix is usually not “pick a bigger model.” It’s almost always about controlling message growth, trimming history, or stopping tool output from ballooning the prompt.

The Most Common Cause

The #1 cause is unbounded message accumulation in an AssistantAgent or UserProxyAgent loop. You keep passing the entire transcript back into the next call, and eventually the OpenAI API returns something like:

•400 Bad Request: This model's maximum context length is 128000 tokens. However, your messages resulted in 131204 tokens.
•context_length_exceeded
•Request too large for gpt-4o-mini

Here’s the broken pattern versus the fixed one.

Broken pattern	Fixed pattern
Keep appending every turn forever	Trim history or summarize before continuing
Re-send full tool output	Store results externally and send only references
No token budget checks	Enforce max message count / token budget

// BROKEN: unbounded history keeps growing
import { AssistantAgent, UserProxyAgent } from "autogen";

const assistant = new AssistantAgent({
  name: "assistant",
  llmConfig: {
    model: "gpt-4o-mini",
    apiKey: process.env.OPENAI_API_KEY,
  },
});

const user = new UserProxyAgent({
  name: "user",
});

const messages: any[] = [];

async function chat(input: string) {
  messages.push({ role: "user", content: input });

  const reply = await assistant.generateReply(messages);
  messages.push(reply);

  return reply;
}

// FIXED: trim or summarize before the next turn
import { AssistantAgent } from "autogen";

const assistant = new AssistantAgent({
  name: "assistant",
  llmConfig: {
    model: "gpt-4o-mini",
    apiKey: process.env.OPENAI_API_KEY,
  },
});

const MAX_MESSAGES = 12;

function trimHistory(messages: any[]) {
  return messages.slice(-MAX_MESSAGES);
}

async function chat(input: string, history: any[]) {
  const nextHistory = trimHistory([
    ...history,
    { role: "user", content: input },
  ]);

  const reply = await assistant.generateReply(nextHistory);

  return {
    reply,
    history: trimHistory([...nextHistory, reply]),
  };
}

If your agent workflow includes tool calls, this gets worse fast. A single PDF extraction or database dump can add tens of thousands of tokens in one shot.

Other Possible Causes

Tool output is too large

AutoGen doesn’t magically compress your tool results. If your function returns a giant JSON blob, that blob gets fed into the next model call.

// Problematic tool result
return JSON.stringify(rows); // rows could be thousands of records

Fix it by returning a summary or paginated slice.

return JSON.stringify({
  count: rows.length,
  sample: rows.slice(0, 10),
});

You are using long system prompts plus verbose instructions

A huge systemMessage eats context before the conversation even starts.

const assistant = new AssistantAgent({
  name: "assistant",
  systemMessage:
    "You are a helpful assistant..." + bigPolicyDoc + bigRunbook + bigFAQ,
});

Move static policy text into retrieval or config files outside the prompt. Keep only what the model needs for this task.

Recursive agent handoffs are duplicating state

This happens when each agent forwards full history to the next agent instead of passing a compact state object.

// Bad: copying entire transcript between agents
await salesAgent.generateReply(fullTranscript);
await supportAgent.generateReply(fullTranscript);

Instead, pass only the latest user intent plus a short summary.

const compactState = [
  { role: "system", content: summary },
  { role: "user", content: latestRequest },
];

Your model context window is smaller than you think

Some deployments use smaller limits than the public docs suggest. A config mismatch can make this look random in production.

llmConfig: {
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
}

Check the actual deployed model and provider limits. If you switched to Azure OpenAI or a proxy layer, verify the effective context size there too.

How to Debug It

•
Log token growth per request
- •Add logging around message count and approximate token count.
- •If you see steady growth across turns, you have an accumulation problem.
•
Print the exact payload sent to the model
- •Dump the final messages array before generateReply().
- •Look for giant tool outputs, repeated summaries, or duplicated transcripts.
•
Isolate one agent and one tool
- •Disable tools first.
- •If the error disappears, your tool output is likely too large.
- •Re-enable tools one by one until it breaks again.
•
Check provider error details
- •
  Search for:
  - •This model's maximum context length is...
  - •context_length_exceeded
  - •invalid_request_error
- •The exact token counts tell you whether it’s prompt size, completion size, or both.

Prevention

•
Keep a hard cap on conversation history.
- •Use last-N messages plus a rolling summary.
•
Treat tool output as untrusted prompt input.
- •Return IDs, counts, and samples instead of raw dumps.
•
Build token checks into your agent wrapper.
- •Fail fast before sending an oversized request to OpenAI.
•
Use shorter system prompts and external memory.
- •Put policy docs and long reference text in retrieval storage, not in every prompt.

If this error only appears in production, that usually means real users are generating longer conversations than your local tests did. Add logging now; otherwise you’ll keep chasing “random” failures that are just oversized payloads reaching AssistantAgent.generateReply() or your underlying OpenAI client.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit