How to Fix 'intermittent 500 errors when scaling' in LangGraph (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
intermittent-500-errors-when-scalinglanggraphtypescript

When LangGraph starts returning intermittent 500 errors only after you scale traffic or add more concurrent runs, the issue is usually not “LangGraph is broken.” It means one of your graph nodes, state reducers, or external dependencies is not safe under concurrency, and the failure only shows up when multiple executions overlap.

In TypeScript projects, this often appears as a generic server error in your API layer while the real exception is buried in the worker logs. The usual symptoms are InvalidUpdateError, GraphRecursionError, or plain TypeError/ECONNRESET exceptions that only happen under load.

The Most Common Cause

The #1 cause is shared mutable state inside node functions. A LangGraph node should be deterministic for a given input state. If you mutate module-level arrays, reuse request-scoped objects, or write to a shared singleton, concurrent runs will stomp on each other and trigger intermittent failures.

Here’s the broken pattern:

BrokenFixed
```ts
import { StateGraph } from "@langchain/langgraph";

let conversationCache: string[] = [];

const graph = new StateGraph({ channels: { messages: { value: (x: string[], y: string[]) => x.concat(y), default: () => [], }, }, });

graph.addNode("appendMessage", async (state) => { conversationCache.push(state.messages.at(-1) ?? ""); return { messages: conversationCache, }; }); |ts import { StateGraph } from "@langchain/langgraph";

const graph = new StateGraph({ channels: { messages: { value: (x: string[], y: string[]) => x.concat(y), default: () => [], }, }, });

graph.addNode("appendMessage", async (state) => { const nextMessages = [...state.messages, state.messages.at(-1) ?? ""]; return { messages: nextMessages, }; });


The broken version uses `conversationCache` outside the graph. Under concurrency, one request can overwrite another request’s data, which can surface as:

- `InvalidUpdateError: Must write to at least one of ['messages']`
- `TypeError: Cannot read properties of undefined`
- random downstream `500` responses from your API handler

If you need per-run context, pass it through graph state or the runtime config, not a top-level variable.

## Other Possible Causes

### 1. Non-idempotent side effects in nodes

If a node writes to Redis, Postgres, or an external API and then retries on failure, you can get duplicate writes or inconsistent state.

```ts
graph.addNode("chargeCard", async (state) => {
  await payments.charge(state.userId, state.amount);
  return { charged: true };
});

Fix it by making the operation idempotent:

graph.addNode("chargeCard", async (state) => {
  await payments.charge({
    userId: state.userId,
    amount: state.amount,
    idempotencyKey: state.runId,
  });
  return { charged: true };
});

2. Missing reducer for concurrent writes

If two branches write to the same field and you did not define how to merge them, LangGraph can throw state update errors under parallel execution.

channels: {
  results: {
    default: () => [],
    // missing reducer here if multiple nodes update results
  },
}

Use a reducer that matches your merge semantics:

channels: {
  results: {
    value: (current: string[], update: string[]) => current.concat(update),
    default: () => [],
  },
}

A common runtime message here is:

  • InvalidUpdateError: At key 'results': Can receive only one value per step

3. Unbounded recursion or cycles in the graph

Under load, bad routing logic can send runs into loops until LangGraph stops them with recursion limits.

if (state.needsRetry) return "retry";
return "done";

If "retry" points back to the same node without a hard stop, you’ll eventually see:

  • GraphRecursionError: Recursion limit of 25 reached without hitting a stop condition

Fix with explicit retry caps in state:

if ((state.retryCount ?? 0) >= 3) return "done";
return "retry";

4. Timeouts and upstream instability

A node that calls OpenAI, Anthropic, a database, or an internal service may work locally and fail only when concurrency increases latency.

const res = await fetch(url); // no timeout

Use timeouts and handle failures explicitly:

const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), 10_000);

try {
  const res = await fetch(url, { signal: controller.signal });
} finally {
  clearTimeout(timer);
}

If your infra returns 500, check whether the real issue is actually a wrapped upstream error like ECONNRESET, ETIMEDOUT, or AbortError.

How to Debug It

  1. Run one request at a time

    • If the error disappears under single-threaded load but appears with concurrency >1, suspect shared mutable state or missing reducers.
    • Test with p-limit set to 1, then increase gradually.
  2. Log node input and output

    • Print the exact state entering and leaving each node.
    • Look for fields that change across unrelated requests.
    • Add request IDs so you can correlate logs:
      console.log({ runId: state.runId, node: "appendMessage", state });
      
  3. Check for LangGraph-specific exceptions

    • Search logs for:
      • InvalidUpdateError
      • GraphRecursionError
      • EmptyChannelError
      • BranchWriteConflict-style update conflicts if you’re using parallel branches
    • These usually point directly at reducer or routing bugs.
  4. Disable external side effects temporarily

    • Stub LLM calls, DB writes, and HTTP requests.
    • If the failures stop, the problem is outside LangGraph itself.
    • Re-enable one dependency at a time until the error returns.

Prevention

  • Keep nodes pure where possible.

    • Derive outputs from input state only.
    • Avoid module-level caches unless they are read-only.
  • Define reducers for every field that can be written by more than one node.

    • If two branches can touch it, make merge behavior explicit.
  • Add stress tests before production.

    • Run concurrent executions against your graph in CI.
    • Verify there are no shared references leaking between runs.

If you’re seeing intermittent 500s in production, treat them as concurrency bugs first and framework bugs second. In LangGraph TypeScript apps, that’s usually where the real fix lives.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides