How to Fix 'connection timeout in production' in LangChain (TypeScript)
What the error means
connection timeout in production usually means your LangChain app opened a request to an LLM provider, vector DB, or internal service and never got a response before the socket timed out. In TypeScript apps this often shows up after deployment because production has stricter network rules, slower cold starts, or different timeout defaults than local.
The key thing: this is rarely a LangChain bug by itself. It’s usually a transport problem around ChatOpenAI, OpenAIEmbeddings, HttpClient, your proxy, or the runtime environment.
The Most Common Cause
The #1 cause is creating a new client on every request and letting the default timeout stay too low for production latency. In LangChain TypeScript, that usually means instantiating ChatOpenAI inside a hot path instead of reusing one configured client.
Broken vs fixed
| Broken pattern | Fixed pattern |
|---|---|
| New client per request | Singleton/reused client |
| Default timeout | Explicit timeout + retries |
| No keep-alive/agent tuning | Production HTTP agent settings |
// ❌ Broken: recreates client every request
import { ChatOpenAI } from "@langchain/openai";
import { NextRequest } from "next/server";
export async function POST(req: NextRequest) {
const { prompt } = await req.json();
const model = new ChatOpenAI({
model: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
// timeout not set; default may be too aggressive for prod traffic
});
const result = await model.invoke(prompt);
return Response.json({ text: result.content });
}
// ✅ Fixed: reuse the client and set sane production timeouts
import { ChatOpenAI } from "@langchain/openai";
import { NextRequest } from "next/server";
const model = new ChatOpenAI({
model: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
timeout: 30_000,
maxRetries: 2,
});
export async function POST(req: NextRequest) {
const { prompt } = await req.json();
const result = await model.invoke(prompt);
return Response.json({ text: result.content });
}
If you’re seeing errors like:
- •
Error [TimeoutError]: Request timed out - •
APIConnectionError: Connection error. - •
FetchError: network timeout at: ...
the fix is often to stop rebuilding clients and give the request enough time to complete.
Other Possible Causes
1) Your serverless function times out before LangChain finishes
This is common in Vercel, AWS Lambda, or Cloud Run when the platform kills the request first.
// Example: Next.js route with too-short runtime assumptions
export const maxDuration = 10; // seconds
// If your chain calls embeddings + retrieval + LLM, this can be too low.
Fix:
- •Increase function timeout in your platform config
- •Reduce chain steps
- •Cache embeddings and retrieval results
2) DNS, proxy, or outbound firewall blocks the provider
In production, your app may not have direct internet access. The error often looks like:
- •
ECONNRESET - •
ETIMEDOUT - •
fetch failed - •
connect ETIMEDOUT api.openai.com:443
Check whether your environment requires a proxy:
process.env.HTTP_PROXY = "http://proxy.internal:8080";
process.env.HTTPS_PROXY = "http://proxy.internal:8080";
If you’re on a locked-down VPC, verify egress rules allow:
- •
api.openai.com - •your embedding provider
- •any vector store endpoint like Pinecone, Weaviate, or Azure OpenAI
3) You’re hitting rate limits and mistaking them for timeouts
Some providers slow responses under load before returning an error. In LangChain you may see retries followed by:
- •
RateLimitError - •
APIConnectionError - •long hangs before failure
Use retries with backoff and inspect upstream headers if available.
const model = new ChatOpenAI({
model: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
maxRetries: 5,
timeout: 45_000,
});
Also reduce concurrency if you batch requests through Promise.all().
4) Your chain does too much work synchronously
A retriever + reranker + LLM call can easily exceed production limits if all steps run inline.
// Heavy synchronous path
const docs = await retriever.invoke(query);
const answer = await model.invoke([
{ role: "system", content: "Answer using context" },
{ role: "user", content: `${query}\n\n${docs.map(d => d.pageContent).join("\n")}` },
]);
Fix:
- •Cap retrieved docs with
k - •Trim context aggressively
- •Move expensive preprocessing offline
How to Debug It
- •
Identify which call is timing out
- •Log timestamps around each step:
- •embeddings
- •retrieval
- •reranking
- •final LLM call
- •Don’t guess. Find the exact hop that stalls.
- •Log timestamps around each step:
- •
Turn on verbose LangChain tracing
import { setDebug } from "@langchain/core/debug"; setDebug(true);This helps you see whether the failure is in
ChatOpenAI,OpenAIEmbeddings, or another runnable. - •
Test the same endpoint outside LangChain
- •Call the provider with plain
fetch() - •If plain HTTP also times out, it’s infrastructure or network
- •If plain HTTP works but LangChain fails, inspect config and retries
- •Call the provider with plain
- •
Check production-specific limits
- •serverless duration
- •outbound firewall rules
- •proxy settings
- •cold start latency
- •container CPU throttling
A useful rule: if local works and prod fails only under load, suspect timeouts plus connection reuse issues before anything else.
Prevention
- •Create shared singleton clients for
ChatOpenAI, embeddings, and vector DB SDKs. - •Set explicit production values for:
- •
timeout - •
maxRetries - •function duration / request deadline
- •
- •Keep chains short:
- •fewer retrieved documents
- •smaller prompts
- •fewer sequential model calls
If you want this to stay stable in production, treat every external dependency as unreliable by default and design for retries, bounded latency, and fewer network hops.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit