How to Fix 'connection timeout in production' in LangChain (Python)
What the error means
connection timeout in production usually means your LangChain app tried to call an upstream service — OpenAI, Anthropic, a vector DB, a tool endpoint, or your own API — and the request never got a response before the client timed out. In production, this usually shows up under load, behind a proxy, or when you’re using the default network settings that were fine in local dev.
The important part: this is rarely a “LangChain bug.” It’s usually a transport/config issue around the model client, retries, timeouts, or network path.
The Most Common Cause
The #1 cause is using the default timeout behavior with a slow or overloaded upstream. In LangChain Python, people often instantiate the model without setting an explicit timeout, then hit openai.APITimeoutError, httpx.ReadTimeout, or httpx.ConnectTimeout once traffic increases.
Here’s the broken pattern versus the fixed pattern.
| Broken | Fixed |
|---|---|
| No explicit timeout | Explicit connect/read timeout |
| No retry strategy | Controlled retries |
| One long-running request path | Bounded request duration |
# BROKEN
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0,
)
response = llm.invoke("Summarize this policy document.")
print(response.content)
# FIXED
from langchain_openai import ChatOpenAI
from httpx import Timeout
llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0,
timeout=Timeout(connect=5.0, read=30.0, write=10.0, pool=5.0),
max_retries=2,
)
response = llm.invoke("Summarize this policy document.")
print(response.content)
If you’re using async code, the same rule applies. Don’t let requests sit forever waiting on upstream I/O.
# FIXED ASYNC PATTERN
import asyncio
from langchain_openai import ChatOpenAI
from httpx import Timeout
llm = ChatOpenAI(
model="gpt-4o-mini",
timeout=Timeout(connect=5.0, read=30.0, write=10.0, pool=5.0),
)
async def main():
result = await llm.ainvoke("Extract key risks from this claim note.")
print(result.content)
asyncio.run(main())
Other Possible Causes
1) Your network path is blocked or unstable
In production, egress rules, VPC routing, NAT exhaustion, or proxy misconfigurations can break outbound calls.
# Example: forcing traffic through a proxy incorrectly
import os
os.environ["HTTPS_PROXY"] = "http://proxy.internal:8080"
os.environ["NO_PROXY"] = "localhost,127.0.0.1"
If your app works locally but times out in prod, check firewall rules and whether your container can reach the provider endpoint at all.
2) You’re calling a tool or retriever that is too slow
LangChain chains often time out because one tool call hangs before the LLM even gets invoked.
# Example: slow retriever inside a chain
docs = retriever.get_relevant_documents("claims handling exception")
Fix it by bounding tool execution:
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(retriever.get_relevant_documents, "claims handling exception")
docs = future.result(timeout=8)
3) You’re sending too much context to the model
Huge prompts increase serialization time and upstream processing time. This shows up as ReadTimeout even when connectivity is fine.
# Bad: stuffing full documents into one prompt
prompt = "\n\n".join(very_large_documents)
result = llm.invoke(prompt)
Trim inputs before sending them:
# Better: summarize chunks first
chunks = very_large_documents[:5]
prompt = "\n\n".join(chunks)
result = llm.invoke(prompt)
4) You have no backoff on transient failures
A single timeout might be temporary. Without retries with backoff, your app fails immediately under transient load.
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=8))
def call_llm():
return llm.invoke("Classify this email.")
Use retries carefully. Retries help with transient network issues; they do not fix bad routing or permanently overloaded services.
How to Debug It
- •
Identify which hop is timing out
- •Is it the LLM call?
- •The retriever?
- •A custom tool?
- •A database/vector store?
Look at the stack trace for classes like
httpx.ReadTimeout,httpx.ConnectTimeout,openai.APITimeoutError, orrequests.exceptions.Timeout. - •
Run the same request outside LangChain
- •Call the provider SDK directly.
- •Call your tool endpoint with
curl. - •If it still times out, LangChain is not the root cause.
- •
Add timing logs around each step
import time start = time.time() docs = retriever.get_relevant_documents("policy cancellation") print("retrieval_seconds=", time.time() - start) start = time.time() result = llm.invoke("Answer using these docs...") print("llm_seconds=", time.time() - start) - •
Check environment-specific limits
- •Container CPU throttling
- •Memory pressure causing GC pauses
- •Proxy and NAT settings
- •Provider rate limits that surface as slow responses before failure
Prevention
- •Set explicit timeouts on every external client: LLMs, retrievers, databases, and HTTP tools.
- •Add retries with exponential backoff for transient upstream failures only.
- •Keep prompts small and bounded; summarize or chunk before calling the model.
- •Test production-like networking early: same VPC rules, same proxy path, same DNS resolution.
If you want one practical rule: treat every external call in your LangChain app as unreliable until proven otherwise. Bound it with a timeout, log it separately, and fail fast instead of letting requests hang until production falls over.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit