How to Fix 'intermittent 500 errors when scaling' in LangChain (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
intermittent-500-errors-when-scalinglangchaintypescript

When you see intermittent 500 errors while scaling a LangChain TypeScript service, it usually means your app is fine at low concurrency but starts failing under parallel load. In practice, this shows up when multiple requests share mutable client state, when you create too many model instances, or when your upstream LLM/API starts rate limiting and LangChain surfaces it as a server error.

The key point: this is usually not a “LangChain bug”. It’s almost always a concurrency, lifecycle, or retry/configuration problem in your app.

The Most Common Cause

The #1 cause is sharing a mutable chain/LLM instance across concurrent requests while also mutating per-request state on it.

In TypeScript, people often create one singleton ChatOpenAI or chain and then attach request-specific data to it. That works locally, then falls apart once the service gets real traffic.

Broken vs fixed pattern

Broken patternFixed pattern
Reuse one mutable chain and mutate inputs/state on itKeep shared clients stateless; create per-request inputs and fresh run context
Store request data on the chain instancePass request data as function arguments
Let concurrent requests race on shared objectsBuild a new runnable/chain per request
// BROKEN
import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0,
});

const prompt = PromptTemplate.fromTemplate(
  "Summarize this ticket for {team}: {text}"
);

// Shared mutable object used by all requests
const sharedState: { team?: string } = {};

export async function summarizeTicket(text: string, team: string) {
  sharedState.team = team;

  // Under load, concurrent calls can race here
  const chain = prompt.pipe(llm);

  return chain.invoke({
    team: sharedState.team,
    text,
  });
}
// FIXED
import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0,
});

const prompt = PromptTemplate.fromTemplate(
  "Summarize this ticket for {team}: {text}"
);

export async function summarizeTicket(text: string, team: string) {
  // No shared mutable request state
  const chain = prompt.pipe(llm);

  return chain.invoke({
    team,
    text,
  });
}

If you’re using RunnableWithMessageHistory, the same rule applies: don’t reuse the same history store key across users unless that is explicitly intended. A bad session key will produce cross-talk that looks like random failures.

Other Possible Causes

1) Rate limiting from OpenAI or another provider

Under scale, the upstream API may return 429, which your app may log as generic 500 if you swallow the real error.

try {
  await chain.invoke(input);
} catch (err) {
  console.error(err);
}

You’ll often see something like:

  • RateLimitError: 429 Too Many Requests
  • BadRequestError
  • APIConnectionError

Fix by adding retries and backoff:

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0,
  maxRetries: 3,
});

2) Node process exhaustion from creating too many clients

If you instantiate ChatOpenAI, vector stores, or retrievers inside every request path without reuse discipline, you can overwhelm sockets and memory.

// Bad in hot path
export async function handler(req: Request) {
  const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
  return llm.invoke("Hello");
}

Prefer one long-lived client per process:

const llm = new ChatOpenAI({ model: "gpt-4o-mini" });

export async function handler(req: Request) {
  return llm.invoke("Hello");
}

3) Timeout mismatch between your server and the LLM call

Your API gateway may cut the request off before LangChain finishes. That often becomes an intermittent 500 depending on prompt length and queue depth.

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
});

// Add explicit timeout at the fetch layer if your runtime supports it

If you’re in Node.js with custom fetch support, use an AbortController:

const controller = new AbortController();
setTimeout(() => controller.abort(), 15_000);

await llm.invoke("...", { signal: controller.signal });

4) Broken memory/history isolation

If multiple users share the same Memory or message history key, requests can collide.

// Bad: same session ID for everyone
const sessionId = "default";

Use a real tenant/user/session identifier:

const sessionId = `${userId}:${conversationId}`;

How to Debug It

  1. Log the real upstream error

    • Don’t stop at 500.
    • Capture err.name, err.message, and any nested response body.
    • Look for classes like RateLimitError, APIConnectionError, or provider-specific HTTP status codes.
  2. Check whether failures correlate with concurrency

    • Run a load test with one request at a time, then increase to 5, 10, 50.
    • If failures start only after concurrency rises, suspect shared state or rate limits.
  3. Remove all mutable globals

    • Search for module-level objects holding request data.
    • Anything like currentUser, sharedState, or reused message arrays is suspicious.
    • Replace with pure function inputs.
  4. Isolate the LLM call

    • Bypass retrievers, tools, and memory.
    • Call the model directly:
      await llm.invoke("ping");
      
    • If that works consistently, the bug is in your chain composition or state handling.

Prevention

  • Keep LangChain runnables stateless.
  • Create per-request inputs; never stash user context on singleton objects.
  • Add retries with backoff for provider errors like 429 and transient network failures.
  • Load test before shipping any agent workflow that fans out across tools or parallel prompts.
  • Log provider status codes separately from application errors so a real upstream failure doesn’t get flattened into a generic 500.

If you’re seeing intermittent 500s only after scaling, start with shared mutable state first. In TypeScript LangChain apps, that’s the most common failure mode by far.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides