How to Fix 'context length exceeded when scaling' in AutoGen (Python)

By Cyprian AaronsUpdated 2026-04-21
context-length-exceeded-when-scalingautogenpython

What the error means

context length exceeded when scaling in AutoGen usually means your agent conversation grew past the model’s token limit while AutoGen was trying to send the full message history back to the LLM. You’ll see it when chats loop, group chats accumulate too much state, or you keep appending large tool outputs without trimming.

In practice, this shows up as a BadRequestError, InvalidRequestError, or a model-specific context window error during AssistantAgent.generate_reply() or GroupChatManager.run_chat().

The Most Common Cause

The #1 cause is unbounded chat history. AutoGen keeps passing previous messages into the next turn, and if you let agents talk for too long, the prompt grows until the model rejects it.

Broken vs fixed pattern

Broken patternFixed pattern
Reuses the same long-lived messages history foreverClears, trims, or summarizes history before it grows too large
Lets every tool output get appended verbatimStores only the relevant parts
No max-turn guardExplicit termination and turn limits
# BROKEN: unbounded history keeps growing
from autogen import AssistantAgent, UserProxyAgent

assistant = AssistantAgent(
    name="assistant",
    llm_config={"model": "gpt-4o-mini", "temperature": 0},
)

user = UserProxyAgent(
    name="user",
    human_input_mode="NEVER",
)

# This can grow until you hit:
# BadRequestError: context length exceeded when scaling
for i in range(200):
    user.initiate_chat(
        assistant,
        message=f"Analyze transaction batch {i} and explain anomalies."
    )
# FIXED: cap turns and trim/summarize history
from autogen import AssistantAgent, UserProxyAgent

assistant = AssistantAgent(
    name="assistant",
    llm_config={"model": "gpt-4o-mini", "temperature": 0},
)

user = UserProxyAgent(
    name="user",
    human_input_mode="NEVER",
)

chat_result = user.initiate_chat(
    assistant,
    message="Analyze this transaction batch and explain anomalies.",
    max_turns=6,
)

# If you need another pass, start fresh with a summary instead of full history.
summary = "Previous analysis found 3 suspicious transfers and one duplicate vendor payment."
user.initiate_chat(
    assistant,
    message=f"{summary}\nNow focus only on remediation steps.",
    max_turns=4,
)

If you are using GroupChat / GroupChatManager, the same issue applies. The manager keeps feeding prior messages into the next round unless you constrain it.

Other Possible Causes

1) Tool output is too large

A common failure mode is dumping entire API responses, PDFs, logs, or database rows into chat history.

# BAD: appending raw payloads
tool_result = fetch_claim_history(claim_id)
messages.append({"role": "tool", "content": str(tool_result)})

Fix it by truncating and extracting only what matters.

# GOOD: keep only relevant fields
tool_result = fetch_claim_history(claim_id)
messages.append({
    "role": "tool",
    "content": f"claim_id={claim_id}, status={tool_result['status']}, flags={tool_result['flags'][:5]}"
})

2) Recursive agent loops without termination

AutoGen agents can keep bouncing messages if your termination condition is weak or missing.

groupchat = GroupChat(
    agents=[planner, analyst, reviewer],
    messages=[],
    max_round=50,
)

Reduce rounds and make termination explicit.

groupchat = GroupChat(
    agents=[planner, analyst, reviewer],
    messages=[],
    max_round=8,
)

Also make sure your reply logic stops on completion markers like "DONE" or "TERMINATE".

3) You are sending full documents instead of chunks

If you paste an entire policy document, claim file, or underwriting packet into one prompt, you will hit limits fast.

# BAD
message = open("policy.pdf.txt").read()
user.initiate_chat(assistant, message=message)

Chunk first.

# GOOD
text = open("policy.pdf.txt").read()
chunks = [text[i:i+4000] for i in range(0, len(text), 4000)]

for chunk in chunks[:3]:
    user.initiate_chat(
        assistant,
        message=f"Review this excerpt:\n\n{chunk}",
        max_turns=2,
    )

4) Model context window is smaller than your workload

Sometimes the code is fine; the model choice is wrong. A smaller-context model will fail sooner under the same workload.

llm_config = {
    "model": "gpt-4o-mini",  # may be too small for your chat size
}

Move to a larger context model if your use case needs longer threads.

llm_config = {
    "model": "gpt-4.1",  # larger context window
}

How to Debug It

  1. Inspect where it fails

    • If it fails inside AssistantAgent.generate_reply(), you are likely overloading prompt history.
    • If it fails during GroupChatManager.run_chat(), check group conversation growth and turn count.
  2. Print message sizes before each call

    • Log token estimates or at least character counts for every message.
    • Watch for one giant tool response or repeated system prompts.
  3. Check whether history is being reused

    • Look for code that reuses the same agent instance across many requests.
    • In AutoGen, long-lived agents often accumulate state unless you reset them.
  4. Reduce scope aggressively

    • Set max_turns lower.
    • Replace full documents with summaries.
    • Trim tool outputs to only key fields.
    • If the error disappears after these changes, you found the cause.

Prevention

  • Set hard limits everywhere:

    • max_turns
    • group chat rounds
    • tool output size
    • document chunk size
  • Summarize before continuing a conversation:

    • Keep a compact running summary of prior decisions.
    • Start new chats from that summary instead of raw history.
  • Treat chat history like memory pressure:

    • Don’t let agents carry every intermediate artifact forever.
    • Store raw data outside the conversation and pass references or extracted facts back into AutoGen.

If you are building production workflows in banking or insurance, this is not optional. Long-running agent loops need explicit budget controls for tokens, turns, and payload size.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides