How to Fix 'connection timeout in production' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-21
connection-timeout-in-productionlangchainpython

What the error means

connection timeout in production usually means your LangChain app tried to call an upstream service — OpenAI, Anthropic, a vector DB, a tool endpoint, or your own API — and the request never got a response before the client timed out. In production, this usually shows up under load, behind a proxy, or when you’re using the default network settings that were fine in local dev.

The important part: this is rarely a “LangChain bug.” It’s usually a transport/config issue around the model client, retries, timeouts, or network path.

The Most Common Cause

The #1 cause is using the default timeout behavior with a slow or overloaded upstream. In LangChain Python, people often instantiate the model without setting an explicit timeout, then hit openai.APITimeoutError, httpx.ReadTimeout, or httpx.ConnectTimeout once traffic increases.

Here’s the broken pattern versus the fixed pattern.

BrokenFixed
No explicit timeoutExplicit connect/read timeout
No retry strategyControlled retries
One long-running request pathBounded request duration
# BROKEN
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
)

response = llm.invoke("Summarize this policy document.")
print(response.content)
# FIXED
from langchain_openai import ChatOpenAI
from httpx import Timeout

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    timeout=Timeout(connect=5.0, read=30.0, write=10.0, pool=5.0),
    max_retries=2,
)

response = llm.invoke("Summarize this policy document.")
print(response.content)

If you’re using async code, the same rule applies. Don’t let requests sit forever waiting on upstream I/O.

# FIXED ASYNC PATTERN
import asyncio
from langchain_openai import ChatOpenAI
from httpx import Timeout

llm = ChatOpenAI(
    model="gpt-4o-mini",
    timeout=Timeout(connect=5.0, read=30.0, write=10.0, pool=5.0),
)

async def main():
    result = await llm.ainvoke("Extract key risks from this claim note.")
    print(result.content)

asyncio.run(main())

Other Possible Causes

1) Your network path is blocked or unstable

In production, egress rules, VPC routing, NAT exhaustion, or proxy misconfigurations can break outbound calls.

# Example: forcing traffic through a proxy incorrectly
import os

os.environ["HTTPS_PROXY"] = "http://proxy.internal:8080"
os.environ["NO_PROXY"] = "localhost,127.0.0.1"

If your app works locally but times out in prod, check firewall rules and whether your container can reach the provider endpoint at all.

2) You’re calling a tool or retriever that is too slow

LangChain chains often time out because one tool call hangs before the LLM even gets invoked.

# Example: slow retriever inside a chain
docs = retriever.get_relevant_documents("claims handling exception")

Fix it by bounding tool execution:

import concurrent.futures

with concurrent.futures.ThreadPoolExecutor() as executor:
    future = executor.submit(retriever.get_relevant_documents, "claims handling exception")
    docs = future.result(timeout=8)

3) You’re sending too much context to the model

Huge prompts increase serialization time and upstream processing time. This shows up as ReadTimeout even when connectivity is fine.

# Bad: stuffing full documents into one prompt
prompt = "\n\n".join(very_large_documents)
result = llm.invoke(prompt)

Trim inputs before sending them:

# Better: summarize chunks first
chunks = very_large_documents[:5]
prompt = "\n\n".join(chunks)
result = llm.invoke(prompt)

4) You have no backoff on transient failures

A single timeout might be temporary. Without retries with backoff, your app fails immediately under transient load.

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=8))
def call_llm():
    return llm.invoke("Classify this email.")

Use retries carefully. Retries help with transient network issues; they do not fix bad routing or permanently overloaded services.

How to Debug It

  1. Identify which hop is timing out

    • Is it the LLM call?
    • The retriever?
    • A custom tool?
    • A database/vector store?

    Look at the stack trace for classes like httpx.ReadTimeout, httpx.ConnectTimeout, openai.APITimeoutError, or requests.exceptions.Timeout.

  2. Run the same request outside LangChain

    • Call the provider SDK directly.
    • Call your tool endpoint with curl.
    • If it still times out, LangChain is not the root cause.
  3. Add timing logs around each step

    import time
    
    start = time.time()
    docs = retriever.get_relevant_documents("policy cancellation")
    print("retrieval_seconds=", time.time() - start)
    
    start = time.time()
    result = llm.invoke("Answer using these docs...")
    print("llm_seconds=", time.time() - start)
    
  4. Check environment-specific limits

    • Container CPU throttling
    • Memory pressure causing GC pauses
    • Proxy and NAT settings
    • Provider rate limits that surface as slow responses before failure

Prevention

  • Set explicit timeouts on every external client: LLMs, retrievers, databases, and HTTP tools.
  • Add retries with exponential backoff for transient upstream failures only.
  • Keep prompts small and bounded; summarize or chunk before calling the model.
  • Test production-like networking early: same VPC rules, same proxy path, same DNS resolution.

If you want one practical rule: treat every external call in your LangChain app as unreliable until proven otherwise. Bound it with a timeout, log it separately, and fail fast instead of letting requests hang until production falls over.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides