How to Fix 'context length exceeded during development' in AutoGen (Python)

By Cyprian AaronsUpdated 2026-04-21
context-length-exceeded-during-developmentautogenpython

What the error means

context length exceeded during development usually means your AutoGen agent sent too much text to the model in one request. In practice, this happens when chat history, tool output, retrieved documents, or nested agent messages keep growing until the LLM prompt crosses the model’s token limit.

You’ll usually see it after a few turns, when an agent loop keeps appending messages instead of trimming them. In AutoGen Python projects, the most common trigger is an AssistantAgent or UserProxyAgent carrying full conversation state into every call.

The Most Common Cause

The #1 cause is unbounded message accumulation. You keep reusing the same agent and keep adding long tool outputs, so every new call includes everything from earlier turns.

Here’s the broken pattern:

BrokenFixed
Keeps full history foreverTrims history or summarizes
Dumps raw tool output into chatStores output externally, sends only relevant slice
Reuses same bloated contextResets state between tasks
# BROKEN
from autogen import AssistantAgent, UserProxyAgent

assistant = AssistantAgent(
    name="assistant",
    llm_config={"config_list": [{"model": "gpt-4o-mini", "api_key": "..." }]}
)

user_proxy = UserProxyAgent(
    name="user",
    human_input_mode="NEVER",
)

# Long tool output gets appended to chat repeatedly
result = user_proxy.initiate_chat(
    assistant,
    message="""
Please analyze this log:
""" + huge_log_blob
)

# Later calls reuse the same conversation context
result = user_proxy.initiate_chat(
    assistant,
    message="Now summarize the root cause."
)
# FIXED
from autogen import AssistantAgent, UserProxyAgent

assistant = AssistantAgent(
    name="assistant",
    llm_config={"config_list": [{"model": "gpt-4o-mini", "api_key": "..."}]}
)

user_proxy = UserProxyAgent(
    name="user",
    human_input_mode="NEVER",
)

# Send only a bounded slice of data
log_excerpt = huge_log_blob[-8000:]  # last N chars, not entire file

result = user_proxy.initiate_chat(
    assistant,
    message=f"Please analyze this log excerpt:\n{log_excerpt}"
)

# Start a fresh conversation for a new task
assistant.clear_history()
user_proxy.clear_history()

If you’re using GroupChat, the same issue applies. Every round adds more speaker messages unless you explicitly cap rounds or summarize.

Other Possible Causes

1) Tool output is too large

A function call returning megabytes of JSON will blow up your prompt fast.

def fetch_records():
    return open("export.json").read()  # too big for chat context

Fix it by returning a compact summary and storing the full payload elsewhere.

def fetch_records():
    data = json.load(open("export.json"))
    return {
        "record_count": len(data),
        "sample_ids": [row["id"] for row in data[:5]]
    }

2) Retrieval injects too many chunks

If you use RAG with AutoGen and stuff 20 chunks into every prompt, token usage spikes immediately.

# Too many retrieved chunks
docs = retriever.get_relevant_documents(query)
context = "\n\n".join(doc.page_content for doc in docs[:20])

Use fewer chunks and smaller chunk sizes.

docs = retriever.get_relevant_documents(query)
context = "\n\n".join(doc.page_content for doc in docs[:4])

3) Nested agents are echoing each other

A GroupChatManager or nested AssistantAgent setup can duplicate context across agents if each agent forwards full transcripts.

groupchat = GroupChat(
    agents=[agent1, agent2, agent3],
    messages=[],
    max_round=30  # can get expensive quickly
)

Lower rounds and summarize between handoffs.

groupchat = GroupChat(
    agents=[agent1, agent2, agent3],
    messages=[],
    max_round=8
)

4) Model context window is smaller than you think

Some configs point to smaller-context models than expected. A request that works on gpt-4o may fail on a smaller deployment or proxy-backed endpoint.

llm_config = {
    "config_list": [
        {
            "model": "gpt-4o-mini",
            "api_type": "azure",
            "max_tokens": 2000,
        }
    ]
}

Check both input and output budgets. If your prompt is already huge, reducing max_tokens alone won’t fix it.

How to Debug It

  1. Print message sizes before each call

    • Inspect how many messages are being sent.
    • Check whether one message is massive due to logs or tool output.
    for i, msg in enumerate(messages):
        print(i, msg["role"], len(msg["content"]))
    
  2. Binary search the prompt

    • Remove half the conversation history.
    • If the error disappears, the issue is prompt growth rather than model config.
  3. Check tool outputs first

    • Log returned payload sizes from every registered function.
    • If one tool returns thousands of lines of JSON, that’s your culprit.
  4. Test with a fresh agent instance

    • Create a new AssistantAgent and UserProxyAgent.
    • If it works cleanly on a fresh run but fails after several turns, you have history accumulation.

Prevention

  • Keep tool outputs short. Return summaries, IDs, counts, and pointers to external storage instead of raw dumps.
  • Add explicit truncation before sending logs, documents, or transcripts into AssistantAgent.
  • Reset state between tasks with fresh agents or cleared history when one job is complete.
  • Cap GroupChat.max_round and summarize between rounds if you’re orchestrating multiple agents.

If you want a reliable rule: never let AutoGen carry unbounded text across turns. The model doesn’t care that the data came from your app logs or retrieval pipeline — it only sees tokens, and tokens hit limits fast.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides