How to Fix 'context length exceeded' in AutoGen (Python)

By Cyprian AaronsUpdated 2026-04-21

context-length-exceededautogenpython

What the error means

context length exceeded means the model received more tokens than its context window allows. In AutoGen, this usually shows up after a few turns of agent-to-agent chatter, a long tool output, or when you keep appending full conversation history into every call.

The failure often looks like one of these:

•openai.BadRequestError: Error code: 400 - {'error': {'message': 'This model's maximum context length is ...'}}
•autogen.oai.client.OpenAIWrapperException: ... context length exceeded
•litellm.ContextWindowExceededError: The prompt is too long

The Most Common Cause

The #1 cause is unbounded conversation growth.

AutoGen agents keep chat history unless you explicitly trim it. If your assistant and user proxy keep passing large messages, or if an agent summarizes nothing and just stores everything, the prompt grows until the model rejects it.

Broken vs fixed pattern

Broken pattern	Fixed pattern
Keep reusing the same agent chat history forever	Limit history, summarize, or reset between tasks
Send raw tool output back into the next turn	Extract only the needed fields
Let nested group chats accumulate every message	Use a bounded memory strategy

# BROKEN: unbounded history keeps growing
from autogen import AssistantAgent, UserProxyAgent

assistant = AssistantAgent(
    name="assistant",
    llm_config={"config_list": [{"model": "gpt-4o-mini", "api_key": "YOUR_KEY"}]},
)

user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
)

# Each run appends more messages to the same chat history
for _ in range(20):
    user_proxy.initiate_chat(
        assistant,
        message="Review this entire contract and list all issues:\n" + open("contract.txt").read()
    )

# FIXED: trim input and reset or summarize between runs
from autogen import AssistantAgent, UserProxyAgent

assistant = AssistantAgent(
    name="assistant",
    llm_config={"config_list": [{"model": "gpt-4o-mini", "api_key": "YOUR_KEY"}]},
)

user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
)

contract_text = open("contract.txt").read()

# Keep only the relevant section
prompt = f"""
Review this clause only:
{contract_text[:6000]}
"""

user_proxy.initiate_chat(assistant, message=prompt)

# If you need another run, start fresh
assistant.reset()
user_proxy.reset()

If you’re using GroupChat, ConversableAgent, or nested agent workflows, this problem gets worse because every intermediate message is preserved.

Other Possible Causes

1) Tool output is too large

A function returning a giant JSON blob or HTML page can blow up the next LLM call.

# BAD
def fetch_customer_data():
    return open("customer_dump.json").read()

# GOOD
def fetch_customer_data():
    data = json.load(open("customer_dump.json"))
    return {
        "customer_id": data["customer_id"],
        "risk_score": data["risk_score"],
        "open_cases": data["open_cases"][:5],
    }

2) Your system prompt is bloated

People paste policies, schemas, examples, and playbooks into system_message. That eats context before the conversation even starts.

# BAD
assistant = AssistantAgent(
    name="assistant",
    system_message=open("all_company_policies.md").read(),
    llm_config=llm_config,
)

# GOOD
assistant = AssistantAgent(
    name="assistant",
    system_message=(
        "You are a claims assistant. "
        "Use only approved policy fields. "
        "Ask for missing information."
    ),
    llm_config=llm_config,
)

3) You’re using a smaller model than your prompt requires

A prompt that fits in GPT-4.1 may fail on a smaller context window model.

llm_config = {
    "config_list": [
        {
            "model": "gpt-4o-mini",  # smaller context than some alternatives
            "api_key": os.environ["OPENAI_API_KEY"],
        }
    ]
}

If your workflow is long-running, move to a larger-context model or reduce payload size.

4) Nested agents multiply token usage

A manager agent calling worker agents can duplicate content across multiple turns. One long email chain becomes several long prompts.

from autogen import GroupChat, GroupChatManager

groupchat = GroupChat(
    agents=[planner, researcher, writer],
    messages=[],
    max_round=20,
)

manager = GroupChatManager(groupchat=groupchat, llm_config=llm_config)

If each agent sees full transcripts plus tool outputs, token usage grows fast. Reduce max_round, trim messages before handoff, or pass structured summaries instead of raw transcripts.

How to Debug It

•
Print token-heavy inputs before each LLM call
- •Inspect message, system_message, tool outputs, and chat history.
- •Look for giant strings, logs, JSON blobs, or repeated clauses.
•
Measure growth turn by turn
- •Log message counts and approximate size.
- •If the last successful turn was small and the next one jumps massively, you found your culprit.
•
Isolate each source
- •
  Run with:
  - •no tools
  - •no prior history
  - •a shorter system prompt
  - •a smaller user message
- •Add them back one at a time until it breaks.
•
Check which layer throws the exception
- •OpenAIWrapperException usually points to AutoGen’s wrapper.
- •BadRequestError from OpenAI means the API rejected the request directly.
- •ContextWindowExceededError often comes from LiteLLM or another provider wrapper.

A practical debug loop looks like this:

def log_payload(label: str, text: str):
    print(f"{label}: chars={len(text)}")

log_payload("system", assistant.system_message)
log_payload("user", prompt)
log_payload("tool_output", tool_result)

If one field is huge, trim that first.

Prevention

•
Keep prompts narrow.
- •Pass only the clause, record section, or claim summary you actually need.
•
Summarize before handoff.
- •Don’t forward raw transcripts between agents; forward compact state objects.
•
Reset chats for discrete tasks.
- •Treat each case review as a fresh session unless continuity is required.
•
Cap tool output.
- •Return top-N records or extracted fields instead of full dumps.
•
Use larger-context models where appropriate.
- •But don’t use model size as an excuse for sloppy prompt hygiene.

If you want a stable AutoGen setup in production, assume every message is expensive. Trim early, summarize aggressively, and never let raw data flow into the prompt unchanged.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit