How to Fix 'intermittent 500 errors when scaling' in LangChain (TypeScript)
When you see intermittent 500 errors while scaling a LangChain TypeScript service, it usually means your app is fine at low concurrency but starts failing under parallel load. In practice, this shows up when multiple requests share mutable client state, when you create too many model instances, or when your upstream LLM/API starts rate limiting and LangChain surfaces it as a server error.
The key point: this is usually not a “LangChain bug”. It’s almost always a concurrency, lifecycle, or retry/configuration problem in your app.
The Most Common Cause
The #1 cause is sharing a mutable chain/LLM instance across concurrent requests while also mutating per-request state on it.
In TypeScript, people often create one singleton ChatOpenAI or chain and then attach request-specific data to it. That works locally, then falls apart once the service gets real traffic.
Broken vs fixed pattern
| Broken pattern | Fixed pattern |
|---|---|
| Reuse one mutable chain and mutate inputs/state on it | Keep shared clients stateless; create per-request inputs and fresh run context |
| Store request data on the chain instance | Pass request data as function arguments |
| Let concurrent requests race on shared objects | Build a new runnable/chain per request |
// BROKEN
import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0,
});
const prompt = PromptTemplate.fromTemplate(
"Summarize this ticket for {team}: {text}"
);
// Shared mutable object used by all requests
const sharedState: { team?: string } = {};
export async function summarizeTicket(text: string, team: string) {
sharedState.team = team;
// Under load, concurrent calls can race here
const chain = prompt.pipe(llm);
return chain.invoke({
team: sharedState.team,
text,
});
}
// FIXED
import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0,
});
const prompt = PromptTemplate.fromTemplate(
"Summarize this ticket for {team}: {text}"
);
export async function summarizeTicket(text: string, team: string) {
// No shared mutable request state
const chain = prompt.pipe(llm);
return chain.invoke({
team,
text,
});
}
If you’re using RunnableWithMessageHistory, the same rule applies: don’t reuse the same history store key across users unless that is explicitly intended. A bad session key will produce cross-talk that looks like random failures.
Other Possible Causes
1) Rate limiting from OpenAI or another provider
Under scale, the upstream API may return 429, which your app may log as generic 500 if you swallow the real error.
try {
await chain.invoke(input);
} catch (err) {
console.error(err);
}
You’ll often see something like:
- •
RateLimitError: 429 Too Many Requests - •
BadRequestError - •
APIConnectionError
Fix by adding retries and backoff:
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0,
maxRetries: 3,
});
2) Node process exhaustion from creating too many clients
If you instantiate ChatOpenAI, vector stores, or retrievers inside every request path without reuse discipline, you can overwhelm sockets and memory.
// Bad in hot path
export async function handler(req: Request) {
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
return llm.invoke("Hello");
}
Prefer one long-lived client per process:
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
export async function handler(req: Request) {
return llm.invoke("Hello");
}
3) Timeout mismatch between your server and the LLM call
Your API gateway may cut the request off before LangChain finishes. That often becomes an intermittent 500 depending on prompt length and queue depth.
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
});
// Add explicit timeout at the fetch layer if your runtime supports it
If you’re in Node.js with custom fetch support, use an AbortController:
const controller = new AbortController();
setTimeout(() => controller.abort(), 15_000);
await llm.invoke("...", { signal: controller.signal });
4) Broken memory/history isolation
If multiple users share the same Memory or message history key, requests can collide.
// Bad: same session ID for everyone
const sessionId = "default";
Use a real tenant/user/session identifier:
const sessionId = `${userId}:${conversationId}`;
How to Debug It
- •
Log the real upstream error
- •Don’t stop at
500. - •Capture
err.name,err.message, and any nested response body. - •Look for classes like
RateLimitError,APIConnectionError, or provider-specific HTTP status codes.
- •Don’t stop at
- •
Check whether failures correlate with concurrency
- •Run a load test with one request at a time, then increase to
5,10,50. - •If failures start only after concurrency rises, suspect shared state or rate limits.
- •Run a load test with one request at a time, then increase to
- •
Remove all mutable globals
- •Search for module-level objects holding request data.
- •Anything like
currentUser,sharedState, or reused message arrays is suspicious. - •Replace with pure function inputs.
- •
Isolate the LLM call
- •Bypass retrievers, tools, and memory.
- •Call the model directly:
await llm.invoke("ping"); - •If that works consistently, the bug is in your chain composition or state handling.
Prevention
- •Keep LangChain runnables stateless.
- •Create per-request inputs; never stash user context on singleton objects.
- •Add retries with backoff for provider errors like
429and transient network failures. - •Load test before shipping any agent workflow that fans out across tools or parallel prompts.
- •Log provider status codes separately from application errors so a real upstream failure doesn’t get flattened into a generic
500.
If you’re seeing intermittent 500s only after scaling, start with shared mutable state first. In TypeScript LangChain apps, that’s the most common failure mode by far.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit