How to Fix 'context length exceeded' in AutoGen (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
context-length-exceededautogentypescript

What the error means

context length exceeded means AutoGen sent a prompt to the model that was larger than the model’s token window. In practice, this shows up when your agent chat history keeps growing, when you stuff too much data into a single message, or when tool output gets echoed back into the conversation.

With TypeScript AutoGen setups, this usually happens after a few turns of multi-agent back-and-forth or when you pass large documents into UserProxyAgent / AssistantAgent without trimming.

The Most Common Cause

The #1 cause is unbounded chat history. AutoGen keeps appending messages, and eventually OpenAIChatCompletionClient tries to send a request that blows past the model’s context limit.

Here’s the broken pattern:

import { AssistantAgent, UserProxyAgent } from "@autogen/core";

const assistant = new AssistantAgent({
  name: "assistant",
  systemMessage: "You are a helpful assistant.",
});

const user = new UserProxyAgent({
  name: "user",
});

for (const chunk of largeDocumentChunks) {
  await user.send({ content: chunk });
  await assistant.run(); // history keeps growing
}

And here’s the fixed pattern:

import { AssistantAgent, UserProxyAgent } from "@autogen/core";

const assistant = new AssistantAgent({
  name: "assistant",
  systemMessage: "You are a helpful assistant.",
});

const user = new UserProxyAgent({
  name: "user",
});

for (const chunk of largeDocumentChunks) {
  await user.send({ content: chunk });

  const result = await assistant.run({
    maxTurns: 1,
    // keep only what you need for this turn
    clearHistory: true,
  });

  console.log(result);
}

The important part is not letting every prior turn stay in memory forever. If you need long-running workflows, summarize state and pass forward only the summary plus the current task.

Broken vs fixed

PatternBrokenFixed
Conversation memoryKeeps every message foreverClears or trims history
Large inputsSends raw docs in fullChunk, summarize, or retrieve relevant slices
Tool outputDumps full JSON/text back into chatReturn only compact results

Other Possible Causes

1. Huge tool output getting injected back into the prompt

If your tool returns a giant JSON blob, AutoGen may add that result to the conversation context.

// Bad: returns entire database export
return JSON.stringify(rows);

Fix it by returning only what the model needs:

// Better: return compact summary
return JSON.stringify({
  count: rows.length,
  sample: rows.slice(0, 5),
});

2. Oversized system message

A long systemMessage with policies, examples, and business rules can eat a big chunk of context before the user even speaks.

const assistant = new AssistantAgent({
  name: "assistant",
  systemMessage: veryLongPolicyDoc,
});

Trim it down:

const assistant = new AssistantAgent({
  name: "assistant",
  systemMessage: [
    "You are an underwriting assistant.",
    "Ask for missing fields before making assumptions.",
    "Return concise answers.",
  ].join("\n"),
});

3. Model context window too small

Not all models have the same token limit. If you’re using a smaller model behind OpenAIChatCompletionClient, your prompt may fit on one model and fail on another.

import { OpenAIChatCompletionClient } from "@autogen/openai";

const client = new OpenAIChatCompletionClient({
  model: "gpt-4o-mini", // smaller window than larger variants in many workloads
});

If your workload is heavy, move to a model with more context or reduce prompt size.

4. Passing full documents instead of retrieved chunks

This is common in RAG flows. Developers load an entire PDF or policy pack into one message instead of retrieving just relevant sections.

await assistant.run({
  input: fullPolicyText, // too large
});

Use retrieval or chunking:

await assistant.run({
  input: relevantChunkSummary,
});

How to Debug It

  1. Log prompt size before each call
    Measure how much text you’re actually sending. In TypeScript, log message lengths and approximate token counts before run().

  2. Inspect which message exploded
    Check whether it was:

    • a giant user upload
    • tool output
    • accumulated chat history
    • an oversized system prompt
  3. Reduce to one turn
    Run the same flow with maxTurns: 1 and no tools. If it passes, the issue is usually history growth or tool spam.

  4. Binary search your inputs
    Cut the document/tool payload in half until the error disappears. That tells you whether the problem is one huge message or gradual accumulation across turns.

A typical OpenAI-side failure looks like this:

BadRequestError: This model's maximum context length is 128000 tokens.
However, your messages resulted in 131245 tokens.
Please reduce the length of the messages.

In AutoGen wrappers, you may also see errors bubble up through agent execution like:

Error during agent run:
BadRequestError: context length exceeded

Prevention

  • Keep agent memory bounded.
    • Summarize old turns.
    • Clear history between tasks.
  • Never send raw large artifacts into chat.
    • Chunk PDFs, logs, and exports.
  • Make tool outputs compact.
    • Return IDs, summaries, counts, and small samples instead of full payloads.
  • Pick a model that matches your workload.
    • Don’t force long-document workflows onto a small context window.

If you’re building production agents in TypeScript, treat context as a budget. Once you start budgeting tokens explicitly, this error stops being random and becomes easy to control.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides