How to Fix 'connection timeout when scaling' in AutoGen (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
connection-timeout-when-scalingautogentypescript

If you see connection timeout when scaling in AutoGen TypeScript, it usually means your agent tried to create or contact a new runtime, but the request never completed before the timeout window expired. In practice, this shows up during AssistantAgent / UserProxyAgent workflows when the model call is slow, the runtime is misconfigured, or your scaling path is spawning too many concurrent requests.

The fix is usually not “increase timeout and hope.” You need to find whether the failure is coming from agent initialization, model transport, or your own concurrency pattern.

The Most Common Cause

The #1 cause is creating too many concurrent agent calls while using a short timeout. In AutoGen TypeScript, this often happens when you fan out multiple run() calls at once and each call tries to scale a runtime or fetch model responses at the same time.

Here’s the broken pattern:

BrokenFixed
```ts
import { AssistantAgent } from "@autogen/core";

const agent = new AssistantAgent({ name: "support-agent", modelClient, });

await Promise.all( tickets.map((ticket) => agent.run(Handle ticket: ${ticket.summary}) ) ); |ts import { AssistantAgent } from "@autogen/core";

const agent = new AssistantAgent({ name: "support-agent", modelClient, });

for (const ticket of tickets) { await agent.run(Handle ticket: ${ticket.summary}); }


The broken version triggers parallel scaling pressure. If your environment is using a hosted runtime, each call may compete for the same connection pool or startup path, which can surface as:

- `Error: connection timeout when scaling`
- `TimeoutError: request timed out while waiting for runtime`
- `AutoGenError: failed to initialize worker within timeout`

If you truly need concurrency, cap it:

```ts
import pLimit from "p-limit";
import { AssistantAgent } from "@autogen/core";

const limit = pLimit(2);

const results = await Promise.all(
  tickets.map((ticket) =>
    limit(() => agent.run(`Handle ticket: ${ticket.summary}`))
  )
);

Other Possible Causes

1. Model endpoint latency is too high

If your modelClient points to a slow endpoint, AutoGen may time out before scaling finishes.

const modelClient = new OpenAIChatCompletionClient({
  model: "gpt-4o",
  apiKey: process.env.OPENAI_API_KEY,
  timeout: 5000, // too aggressive for cold starts / long prompts
});

Fix by increasing the client timeout and trimming prompt size.

const modelClient = new OpenAIChatCompletionClient({
  model: "gpt-4o",
  apiKey: process.env.OPENAI_API_KEY,
  timeout: 30000,
});

2. You are recreating agents on every request

This causes repeated initialization overhead and can look like scaling failures.

app.post("/chat", async (req, res) => {
  const agent = new AssistantAgent({ name: "agent", modelClient });
  const reply = await agent.run(req.body.message);
  res.json(reply);
});

Reuse the agent instance instead:

const agent = new AssistantAgent({ name: "agent", modelClient });

app.post("/chat", async (req, res) => {
  const reply = await agent.run(req.body.message);
  res.json(reply);
});

3. Your transport or proxy blocks long-lived connections

This happens behind corporate proxies, API gateways, or serverless environments with short execution windows.

{
  "timeout": 10,
  "keepAlive": false,
  "baseUrl": "https://api.your-proxy.internal"
}

Check proxy idle timeouts, gateway limits, and serverless function duration. A Lambda with a 10-second ceiling will fail under load even if AutoGen is configured correctly.

4. Tool execution hangs before the next step

If an AssistantAgent waits on a tool call that never returns, the scaling step appears broken even though the real issue is downstream.

const tools = [
  async function fetchCustomerRecord(id: string) {
    // missing timeout wrapper
    return db.query(`SELECT * FROM customers WHERE id='${id}'`);
  },
];

Wrap external I/O with explicit timeouts and retries:

async function withTimeout<T>(promise: Promise<T>, ms: number): Promise<T> {
  return Promise.race([
    promise,
    new Promise<T>((_, reject) =>
      setTimeout(() => reject(new Error("tool timeout")), ms)
    ),
  ]);
}

How to Debug It

  1. Find where it fails

    • Add logs before and after every run() call.
    • If it fails before entering your tool code, the problem is likely client/runtime setup.
    • If it fails after tool invocation starts, inspect that tool first.
  2. Reduce concurrency to one

    • Replace Promise.all(...) with a single sequential call.
    • If the error disappears, you’ve confirmed a contention/scaling issue.
  3. Increase only one timeout at a time

    • Start with the model client timeout.
    • Then check any reverse proxy or gateway timeout.
    • Then check your serverless execution limit.
    • Don’t change three knobs at once; you won’t know what fixed it.
  4. Inspect actual error text and stack trace

    • Look for classes like TimeoutError, AutoGenError, or fetch-layer errors such as UND_ERR_CONNECT_TIMEOUT.
    • A stack trace pointing into HTTP transport means network/client config.
    • A stack trace pointing into your tool handler means application logic.

Prevention

  • Reuse AssistantAgent and model clients across requests instead of constructing them per call.
  • Put hard timeouts around tools and external API calls so one slow dependency does not stall scaling.
  • Limit parallel AutoGen runs with a queue or concurrency cap instead of firing off unbounded Promise.all.

If you want a stable production setup in TypeScript, treat AutoGen scaling like any other distributed system problem: control concurrency, keep initialization warm, and make every external dependency fail fast.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides