How to Fix 'connection timeout when scaling' in LangChain (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

connection-timeout-when-scalinglangchaintypescript

What this error means

connection timeout when scaling usually shows up when your LangChain app starts more concurrent work than the upstream service, DB, or network path can handle. In practice, it happens during batch runs, parallel tool calls, or when you “scale” by increasing concurrency without changing client timeouts or connection limits.

With TypeScript + LangChain, the failure often appears as a wrapped network error from the underlying SDK, not as a clean LangChainError. You’ll usually see something like ETIMEDOUT, ECONNRESET, or a provider-specific timeout bubbling out of ChatOpenAI, OpenAIEmbeddings, RunnableParallel, or a custom tool.

The Most Common Cause

The #1 cause is uncontrolled concurrency.

A lot of people take a working single-request chain and then run it over an array with Promise.all(). That looks fine until the provider starts throttling connections or your Node process exhausts sockets.

Broken pattern vs fixed pattern

Broken	Fixed
Fires everything at once with no backpressure	Limits concurrency and reuses clients
Creates extra pressure on HTTP sockets	Keeps request rate predictable

import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

const inputs = Array.from({ length: 50 }, (_, i) => `Summarize ticket ${i}`);

// ❌ Broken: 50 requests launched at once
const results = await Promise.all(
  inputs.map((input) => llm.invoke(input))
);

import pLimit from "p-limit";
import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

const inputs = Array.from({ length: 50 }, (_, i) => `Summarize ticket ${i}`);
const limit = pLimit(5); // ✅ cap concurrency

const results = await Promise.all(
  inputs.map((input) =>
    limit(() => llm.invoke(input))
  )
);

If you’re using LangChain runnables, the same rule applies:

// ❌ Broken
await runnable.batch(inputs, { maxConcurrency: 50 });

// ✅ Better
await runnable.batch(inputs, { maxConcurrency: 5 });

If the upstream is rate-limited or connection-pooled tightly, this is usually the fix.

Other Possible Causes

1. Short client timeout

Your app may be timing out before the model finishes, especially under load.

import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  timeout: 10_000, // too low for slow requests
});

Fix it by increasing the timeout:

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  timeout: 60_000,
});

2. No retry policy on transient failures

A single timeout under burst traffic can fail the whole batch if you don’t retry.

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  maxRetries: 0,
});

Use retries for transient network issues:

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  maxRetries: 3,
});

3. Creating a new client per request

This burns sockets and defeats pooling. I still see people do this inside request handlers.

// ❌ Broken
app.post("/summarize", async (req, res) => {
  const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
  const out = await llm.invoke(req.body.text);
  res.json({ out });
});

Create one shared instance instead:

// ✅ Fixed
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });

app.post("/summarize", async (req, res) => {
  const out = await llm.invoke(req.body.text);
  res.json({ out });
});

4. Upstream service limits or proxy issues

Sometimes LangChain is fine and the problem is your network path.

Common examples:

•corporate proxy closing idle connections
•Kubernetes egress limits
•VPC/NAT saturation
•OpenAI/Azure/OpenRouter provider-side throttling

Check environment config like this:

HTTP_PROXY=http://proxy.internal:8080
HTTPS_PROXY=http://proxy.internal:8080
NO_PROXY=localhost,127.0.0.1,.svc.cluster.local

If you’re behind a proxy, make sure Node and your SDK are actually using it.

How to Debug It

•
Inspect the real underlying error
- •Don’t stop at LangChain’s wrapper.
- •Log error.cause, error.stack, and any provider response metadata.
- •Look for ETIMEDOUT, ECONNRESET, 429, or socket errors.
•
Reduce concurrency to one
- •Run a single .invoke() call.
- •If that works, increase to 2, then 5, then 10.
- •If failures start only after concurrency rises, you’ve found the bottleneck.
•
Increase timeout and add retries
- •Temporarily set timeout to something large like 60_000.
- •Set maxRetries to 3.
- •If the issue disappears, it was likely transient latency or burst pressure.
•
Check whether you’re recreating clients
- •Search for new ChatOpenAI( inside handlers, loops, and job workers.
- •Move clients to module scope unless you have a strong reason not to.
- •Reuse embeddings/model instances across requests.

Prevention

•
Keep concurrency explicit:
- •Use p-limit or LangChain’s { maxConcurrency }
- •Don’t default to Promise.all() for bulk LLM work
•
Set sane transport defaults:
- •Higher timeouts for slow providers
- •Retries for transient network failures
- •One shared client per process where possible
•
Load test before production:
- •Run batch jobs against staging with realistic volume
- •Watch socket usage, p95 latency, and provider error rates

If you want one rule to remember: this error is usually not “LangChain being broken.” It’s almost always too much parallelism hitting too little network capacity.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit