How to Fix 'connection timeout when scaling' in AutoGen (Python)

By Cyprian AaronsUpdated 2026-04-21
connection-timeout-when-scalingautogenpython

What the error means

connection timeout when scaling in AutoGen usually means one of your agents, tools, or model calls tried to reach a backend service and never got a response before the timeout expired. In practice, this shows up when you scale from a local test to multiple concurrent agent runs, longer chats, or remote model endpoints.

The failure is often not in AutoGen itself. It’s usually a network issue, an overloaded model server, or a bad client configuration that only becomes visible once the workload increases.

The Most Common Cause

The #1 cause is creating a new LLM client inside every agent call or every turn. That pattern works in small tests, then starts timing out under concurrency because you keep opening fresh connections instead of reusing a configured client.

Here’s the broken pattern versus the fixed one:

Broken patternFixed pattern
Recreates the client per requestReuses one configured client
No explicit timeout/retry policySets sane timeout and retries
Scales poorly under parallel runsStable under multiple agent calls
# WRONG: new client created repeatedly inside the workflow
from autogen import AssistantAgent

def run_task(prompt: str):
    llm_config = {
        "config_list": [
            {
                "model": "gpt-4o-mini",
                "api_key": os.environ["OPENAI_API_KEY"],
            }
        ]
    }

    agent = AssistantAgent(
        name="assistant",
        llm_config=llm_config,
    )

    return agent.run(prompt)
# RIGHT: create one shared config/client and reuse it
import os
from autogen import AssistantAgent

llm_config = {
    "config_list": [
        {
            "model": "gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
            "timeout": 60,
            "max_retries": 3,
        }
    ]
}

agent = AssistantAgent(
    name="assistant",
    llm_config=llm_config,
)

def run_task(prompt: str):
    return agent.run(prompt)

If you’re using GroupChatManager, ConversableAgent, or multiple workers, the same rule applies: don’t rebuild model clients inside loops. Keep the connection setup at process startup.

Other Possible Causes

1) Your model endpoint is slow or rate-limited

If you see errors like:

  • openai.APITimeoutError
  • httpx.ReadTimeout
  • Connection timed out

the upstream endpoint may be overloaded or throttling you.

llm_config = {
    "config_list": [
        {
            "model": "gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
            "timeout": 120,
            "max_retries": 5,
        }
    ]
}

If you’re calling Azure OpenAI, check deployment health and quota. If you’re calling a local proxy, check whether it’s actually keeping up with your request volume.

2) You’re hitting a local model server that can’t keep connections open

This is common with Ollama, vLLM, LM Studio, or custom FastAPI wrappers. AutoGen sends requests fine until the server starts queueing too many concurrent generations.

llm_config = {
    "config_list": [
        {
            "model": "local-model",
            "base_url": "http://127.0.0.1:8000/v1",
            "api_key": "dummy",
            "timeout": 90,
        }
    ]
}

Check server logs for queue buildup, GPU OOMs, or worker restarts. If the backend drops connections mid-flight, AutoGen will surface it as a timeout during scaling.

3) DNS, proxy, or firewall issues in your environment

In corporate networks, outbound calls may work once and then fail under load because of proxy limits or blocked egress paths.

export HTTPS_PROXY=http://proxy.company.local:8080
export HTTP_PROXY=http://proxy.company.local:8080

If your app runs in Docker or Kubernetes, verify that container DNS resolution is stable and that outbound traffic to the model host is allowed.

4) Too much parallelism in GroupChat or multi-agent orchestration

When you scale agents aggressively, you can create more simultaneous LLM requests than your provider allows.

# Example: too many concurrent workers can trigger timeouts
from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(max_workers=32) as pool:
    results = list(pool.map(run_task, prompts))

Reduce concurrency first. A stable system with max_workers=4 beats an unstable one at 32.

How to Debug It

  1. Confirm where the timeout happens

    • Look for the exact exception class.
    • Common ones include openai.APITimeoutError, httpx.ConnectTimeout, and httpx.ReadTimeout.
    • If it fails inside AssistantAgent.run() or GroupChatManager.run_chat(), it’s usually downstream connectivity rather than AutoGen logic.
  2. Test the model endpoint outside AutoGen

    • Call the same API with a minimal Python script.
    • If raw HTTP fails too, the issue is not AutoGen.
    • Use a single request before testing multi-agent scaling.
  3. Lower concurrency

    • Reduce worker count.
    • Reduce simultaneous agent turns.
    • If timeouts disappear when concurrency drops, you’ve found a capacity issue.
  4. Add logging around request duration

    • Measure how long each call takes before failure.
    • Long tail latency usually points to rate limiting or backend saturation.
import time

start = time.time()
result = agent.run("Summarize this claim")
print(f"LLM call took {time.time() - start:.2f}s")

Prevention

  • Reuse one shared LLM config/client per process instead of building new ones inside loops.
  • Set explicit timeout and max_retries values for every production agent.
  • Cap concurrency based on real backend capacity, not what your laptop can handle.
  • Test against the same model host and network path you’ll use in production.

If you’re running AutoGen at scale and seeing connection timeout when scaling, treat it like an infrastructure problem first. The fix is usually better client lifecycle management, lower concurrency, and tighter control over the backend your agents are calling.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides