How to Fix 'connection timeout when scaling' in LangChain (TypeScript)
What this error means
connection timeout when scaling usually shows up when your LangChain app starts more concurrent work than the upstream service, DB, or network path can handle. In practice, it happens during batch runs, parallel tool calls, or when you “scale” by increasing concurrency without changing client timeouts or connection limits.
With TypeScript + LangChain, the failure often appears as a wrapped network error from the underlying SDK, not as a clean LangChainError. You’ll usually see something like ETIMEDOUT, ECONNRESET, or a provider-specific timeout bubbling out of ChatOpenAI, OpenAIEmbeddings, RunnableParallel, or a custom tool.
The Most Common Cause
The #1 cause is uncontrolled concurrency.
A lot of people take a working single-request chain and then run it over an array with Promise.all(). That looks fine until the provider starts throttling connections or your Node process exhausts sockets.
Broken pattern vs fixed pattern
| Broken | Fixed |
|---|---|
| Fires everything at once with no backpressure | Limits concurrency and reuses clients |
| Creates extra pressure on HTTP sockets | Keeps request rate predictable |
import { ChatOpenAI } from "@langchain/openai";
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
});
const inputs = Array.from({ length: 50 }, (_, i) => `Summarize ticket ${i}`);
// ❌ Broken: 50 requests launched at once
const results = await Promise.all(
inputs.map((input) => llm.invoke(input))
);
import pLimit from "p-limit";
import { ChatOpenAI } from "@langchain/openai";
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
});
const inputs = Array.from({ length: 50 }, (_, i) => `Summarize ticket ${i}`);
const limit = pLimit(5); // ✅ cap concurrency
const results = await Promise.all(
inputs.map((input) =>
limit(() => llm.invoke(input))
)
);
If you’re using LangChain runnables, the same rule applies:
// ❌ Broken
await runnable.batch(inputs, { maxConcurrency: 50 });
// ✅ Better
await runnable.batch(inputs, { maxConcurrency: 5 });
If the upstream is rate-limited or connection-pooled tightly, this is usually the fix.
Other Possible Causes
1. Short client timeout
Your app may be timing out before the model finishes, especially under load.
import { ChatOpenAI } from "@langchain/openai";
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
timeout: 10_000, // too low for slow requests
});
Fix it by increasing the timeout:
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
timeout: 60_000,
});
2. No retry policy on transient failures
A single timeout under burst traffic can fail the whole batch if you don’t retry.
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
maxRetries: 0,
});
Use retries for transient network issues:
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
maxRetries: 3,
});
3. Creating a new client per request
This burns sockets and defeats pooling. I still see people do this inside request handlers.
// ❌ Broken
app.post("/summarize", async (req, res) => {
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
const out = await llm.invoke(req.body.text);
res.json({ out });
});
Create one shared instance instead:
// ✅ Fixed
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
app.post("/summarize", async (req, res) => {
const out = await llm.invoke(req.body.text);
res.json({ out });
});
4. Upstream service limits or proxy issues
Sometimes LangChain is fine and the problem is your network path.
Common examples:
- •corporate proxy closing idle connections
- •Kubernetes egress limits
- •VPC/NAT saturation
- •OpenAI/Azure/OpenRouter provider-side throttling
Check environment config like this:
HTTP_PROXY=http://proxy.internal:8080
HTTPS_PROXY=http://proxy.internal:8080
NO_PROXY=localhost,127.0.0.1,.svc.cluster.local
If you’re behind a proxy, make sure Node and your SDK are actually using it.
How to Debug It
- •
Inspect the real underlying error
- •Don’t stop at LangChain’s wrapper.
- •Log
error.cause,error.stack, and any provider response metadata. - •Look for
ETIMEDOUT,ECONNRESET,429, or socket errors.
- •
Reduce concurrency to one
- •Run a single
.invoke()call. - •If that works, increase to
2, then5, then10. - •If failures start only after concurrency rises, you’ve found the bottleneck.
- •Run a single
- •
Increase timeout and add retries
- •Temporarily set
timeoutto something large like60_000. - •Set
maxRetriesto3. - •If the issue disappears, it was likely transient latency or burst pressure.
- •Temporarily set
- •
Check whether you’re recreating clients
- •Search for
new ChatOpenAI(inside handlers, loops, and job workers. - •Move clients to module scope unless you have a strong reason not to.
- •Reuse embeddings/model instances across requests.
- •Search for
Prevention
- •
Keep concurrency explicit:
- •Use
p-limitor LangChain’s{ maxConcurrency } - •Don’t default to
Promise.all()for bulk LLM work
- •Use
- •
Set sane transport defaults:
- •Higher timeouts for slow providers
- •Retries for transient network failures
- •One shared client per process where possible
- •
Load test before production:
- •Run batch jobs against staging with realistic volume
- •Watch socket usage, p95 latency, and provider error rates
If you want one rule to remember: this error is usually not “LangChain being broken.” It’s almost always too much parallelism hitting too little network capacity.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit