How to Fix 'intermittent 500 errors when scaling' in AutoGen (Python)

By Cyprian AaronsUpdated 2026-04-21
intermittent-500-errors-when-scalingautogenpython

When AutoGen starts returning intermittent 500 Internal Server Error responses during scaling, it usually means your agent setup is fine at low concurrency but breaks once requests overlap, time out, or hit shared state. In practice, this shows up when you move from local testing to multiple workers, async tool calls, or a hosted model endpoint behind a load balancer.

The error is rarely “the server is down.” It’s usually one of a few predictable issues: shared mutable state, non-thread-safe client reuse, bad retry handling, or request bursts that exceed upstream limits.

The Most Common Cause — Shared Mutable State in Agent or Tool Code

The #1 cause I see is reusing mutable objects across concurrent AutoGen runs. That includes global conversation history, shared AssistantAgent instances with mutable memory, and tool functions that write to the same dictionary or file without locking.

Here’s the broken pattern:

# BROKEN: shared mutable state across requests
from autogen import AssistantAgent, UserProxyAgent

conversation_history = []

assistant = AssistantAgent(
    name="assistant",
    llm_config={"config_list": [{"model": "gpt-4o-mini", "api_key": API_KEY}]},
)

user_proxy = UserProxyAgent(name="user_proxy")

def handle_request(prompt: str):
    conversation_history.append({"role": "user", "content": prompt})

    result = user_proxy.initiate_chat(
        assistant,
        message=prompt,
    )

    conversation_history.append({"role": "assistant", "content": str(result))
    return result

And the fixed version:

# FIXED: isolate per-request state
from autogen import AssistantAgent, UserProxyAgent

def build_agents():
    assistant = AssistantAgent(
        name="assistant",
        llm_config={"config_list": [{"model": "gpt-4o-mini", "api_key": API_KEY}]},
    )
    user_proxy = UserProxyAgent(name="user_proxy")
    return assistant, user_proxy

def handle_request(prompt: str):
    assistant, user_proxy = build_agents()

    result = user_proxy.initiate_chat(
        assistant,
        message=prompt,
    )
    return result

Why this matters:

  • AutoGen agents are not magic stateless functions.
  • If you reuse the same agent instance across threads or async tasks, one request can overwrite another.
  • That often surfaces as upstream 500 errors because the model call gets malformed context or your tool crashes mid-run.

If you need shared memory, use an explicit store keyed by request/session ID. Don’t hang it off a module-level list.

Other Possible Causes

1) Your tool function throws and AutoGen wraps it as a 500

If you register a Python tool that raises an exception, many deployments surface that as a generic server error.

# BROKEN
def lookup_policy(policy_id: str):
    return POLICIES[policy_id]  # KeyError becomes a 500 somewhere upstream

Fix it with explicit validation:

# FIXED
def lookup_policy(policy_id: str):
    policy = POLICIES.get(policy_id)
    if not policy:
        return {"error": f"policy_id {policy_id} not found"}
    return policy

2) You’re hitting model rate limits during burst scaling

A burst of parallel chats can exceed provider limits. Depending on your stack, this may show up as 500, 429, or an opaque gateway failure.

llm_config = {
    "config_list": [{"model": "gpt-4o-mini", "api_key": API_KEY}],
    "temperature": 0,
}

Add retries and backoff at the client boundary:

import time

def call_with_retry(fn, retries=3):
    for attempt in range(retries):
        try:
            return fn()
        except Exception as e:
            if attempt == retries - 1:
                raise
            time.sleep(2 ** attempt)

3) Your async/sync boundary is wrong

AutoGen code often fails when sync code calls async tools incorrectly or vice versa.

# BROKEN
result = asyncio.run(user_proxy.initiate_chat(assistant, message=prompt))

Use the API in the mode it expects. If your app is already running inside an event loop, keep everything async and avoid nested asyncio.run() calls.

4) Your deployment is reusing stale client objects

Some OpenAI-compatible clients keep connection pools that behave badly under process forking or hot reloads.

# BROKEN: client created at import time in a forked worker model
client = OpenAI(api_key=API_KEY)

Create clients per process startup or per request path depending on your server model:

def get_client():
    return OpenAI(api_key=API_KEY)

How to Debug It

  1. Check whether the failure correlates with concurrency

    • Run one request at a time.
    • Then run 5, then 20.
    • If errors only appear under load, suspect shared state or rate limiting.
  2. Log the exact exception before AutoGen wraps it

    • Look for messages like:
      • openai.RateLimitError
      • KeyError
      • RuntimeError: asyncio.run() cannot be called from a running event loop
      • httpx.ReadTimeout
    • Don’t stop at “500”; find the inner exception.
  3. Disable tools one by one

    • Remove registered functions first.
    • Then remove memory/state.
    • Then test only the LLM call.
    • This isolates whether the failure comes from AutoGen orchestration or your Python code.
  4. Test with fresh agent instances

    • Instantiate AssistantAgent and UserProxyAgent inside the request handler.
    • If the error disappears, you had cross-request contamination.

Prevention

  • Create agents per request unless you have a proven thread-safe design.
  • Keep tool functions pure where possible; avoid global writes and hidden dependencies.
  • Add structured logging around:
    • request ID
    • agent name
    • tool name
    • exception type
  • Put retries at the edge of your app for transient provider failures, but never retry blindly on deterministic Python exceptions.

If you’re seeing intermittent 500 errors when scaling in AutoGen with Python, start with shared state first. In real deployments, that’s usually where the bug lives.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides