How to Fix 'intermittent 500 errors when scaling' in AutoGen (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

intermittent-500-errors-when-scalingautogentypescript

When AutoGen starts returning intermittent 500 responses during scale-out, it usually means one of two things: your agent runtime is not safe under concurrency, or one of the downstream calls is failing and getting wrapped as a generic server error. In TypeScript projects, this often shows up after adding more workers, more parallel chats, or a load balancer in front of your agent service.

The nasty part is that the error is often nondeterministic. One request succeeds, the next fails with something like HTTP 500 Internal Server Error, AgentRuntimeError, or Error: Failed to execute tool call, and the pattern only appears under load.

The Most Common Cause

The #1 cause is shared mutable state inside a single AutoGen agent instance or runtime being reused across concurrent requests.

In TypeScript, people often create one AssistantAgent or one DefaultAzureCredential-backed client at module scope, then reuse it for every request. That works locally until multiple requests hit the same conversation state, message buffer, or tool context at once.

Broken pattern vs fixed pattern

Broken pattern	Fixed pattern
Reuses one agent for all requests	Creates isolated agent/session per request
Shares mutable conversation state	Keeps request-scoped state
Fails under concurrency	Safe under horizontal scaling

// BROKEN: shared agent instance across requests
import { AssistantAgent } from "@autogen/agent";

const agent = new AssistantAgent({
  name: "support-agent",
  systemMessage: "You help users with policy questions.",
});

export async function POST(req: Request) {
  const { prompt } = await req.json();

  // Multiple concurrent requests mutate the same agent state
  const result = await agent.run(prompt);

  return Response.json({ result });
}

// FIXED: create a fresh agent per request
import { AssistantAgent } from "@autogen/agent";

export async function POST(req: Request) {
  const { prompt } = await req.json();

  const agent = new AssistantAgent({
    name: "support-agent",
    systemMessage: "You help users with policy questions.",
  });

  const result = await agent.run(prompt);

  return Response.json({ result });
}

If you need shared config, share config. Do not share mutable runtime objects unless the class explicitly documents thread safety.

A second variant of this problem is reusing a single conversation thread or memory store without locking:

// BROKEN
const history: string[] = [];

export async function handle(prompt: string) {
  history.push(prompt);
  return agent.run(history.join("\n"));
}

Use request-scoped history instead:

// FIXED
export async function handle(prompt: string) {
  const history = [prompt];
  return agent.run(history.join("\n"));
}

Other Possible Causes

1. Tool timeouts under load

If your tools call external APIs, scaling increases queueing and latency. AutoGen may surface this as a generic 500 even though the real failure is a timeout.

const result = await agent.run("lookup claim status", {
  timeoutMs: 3000,
});

Fix by setting realistic timeouts and retry policies on the tool itself:

const result = await agent.run("lookup claim status", {
  timeoutMs: 15000,
});

2. Non-idempotent tool calls

If your tool writes data and retries happen during scale events, duplicate side effects can trigger failures.

// BROKEN: creates duplicate records on retry
await claimsTool.createClaim({ policyId, amount });

Use idempotency keys:

await claimsTool.createClaim({
  policyId,
  amount,
  idempotencyKey: requestId,
});

3. Too many parallel model calls

Some teams fan out multiple AutoGen agents at once without bounding concurrency. That can exhaust sockets, rate limits, or memory.

// BROKEN
await Promise.all(users.map((u) => runAgent(u)));

Bound concurrency:

import pLimit from "p-limit";

const limit = pLimit(5);
await Promise.all(users.map((u) => limit(() => runAgent(u))));

4. Bad serialization in message payloads

AutoGen messages often include structured content. If you pass circular objects or non-serializable values into logs, queues, or persistence layers, you can get intermittent failures when workers scale out.

// BROKEN
await queue.publish({
  message,
  context: req, // Request object is not serializable
});

Strip it down to JSON-safe data:

await queue.publish({
  message,
  context: {
    requestId,
    userId,
    tenantId,
  },
});

How to Debug It

•
Check whether failures correlate with concurrency
- •Run one request at a time.
- •Then run 10, 25, and 50 concurrent requests.
- •If the error only appears under load, suspect shared state or downstream saturation.
•
Log the exact AutoGen exception
- •Look for class names like AgentRuntimeError, ToolExecutionError, or wrapped HTTP errors.
- •Capture stack traces and request IDs.
- •
  Example:
```
try {
  await agent.run(prompt);
} catch (err) {
  console.error("AutoGen failed", err);
  throw err;
}
```
•
Isolate tools from the model
- •Disable tools temporarily.
- •If the 500 disappears, the issue is in a tool call, timeout, auth token refresh, or serialization path.
- •If it remains, focus on agent lifecycle and shared state.
•
Inspect worker logs and upstream metrics
- •Check CPU spikes, memory pressure, socket exhaustion, and rate-limit responses from OpenAI/Azure OpenAI.
- •A lot of “intermittent” 500s are actually retries after upstream 429/503 failures that were not handled cleanly.

Prevention

•Create agents per request or per session unless you have verified they are safe to reuse under concurrency.
•Put hard limits on tool execution time and concurrent fan-out.
•Use idempotency keys for any tool that writes state.
•
Add structured logging around every AutoGen boundary:
- •input prompt size
- •tool name
- •duration
- •exception class
•Load test before production rollout. If it breaks at 20 concurrent chats in staging, it will break harder in prod.

If you’re seeing intermittent 500 errors when scaling in AutoGen TypeScript, start with object reuse first. In practice, that’s where most of these issues live.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit