How to Fix 'timeout error' in LangChain (Python)
A timeout error in LangChain usually means one of your network calls took longer than the configured limit and got cut off. In Python, this often shows up when calling an LLM, a retriever, or an external API through a chain, especially under load or when the model is slow to respond.
The tricky part is that the timeout may not be coming from LangChain itself. It can come from the underlying provider SDK, requests, httpx, your vector store client, or even your own wrapper code.
The Most Common Cause
The #1 cause is a missing or too-aggressive timeout configuration in the underlying client.
LangChain passes work to model/provider clients like OpenAI, Anthropic, Azure OpenAI, or custom HTTP clients. If that client has a short timeout, you’ll see errors like:
- •
openai.APITimeoutError: Request timed out - •
httpx.ReadTimeout: The read operation timed out - •
requests.exceptions.Timeout: HTTPSConnectionPool(...) Read timed out
Here’s the broken pattern and the fixed pattern side by side:
| Broken | Fixed |
|---|---|
| ```python | |
| from langchain_openai import ChatOpenAI |
No explicit timeout, default may be too low for your workload
llm = ChatOpenAI(model="gpt-4o-mini")
response = llm.invoke("Summarize this long document")
print(response)
|python
from langchain_openai import ChatOpenAI
Give the provider enough time for slower requests
llm = ChatOpenAI( model="gpt-4o-mini", timeout=60, max_retries=2, )
response = llm.invoke("Summarize this long document") print(response)
If you’re using a chain, set the timeout on the model object, not just around the chain call. That’s where most people miss it.
For older or custom setups using `requests`/`httpx`, make sure you pass explicit timeouts there too:
```python
import httpx
from langchain_openai import ChatOpenAI
client = httpx.Client(timeout=httpx.Timeout(60.0))
llm = ChatOpenAI(model="gpt-4o-mini", http_client=client)
Other Possible Causes
1) Your prompt is too large
Large prompts increase latency and can push requests over the timeout limit.
# Risky: stuffing too much text into one request
prompt = f"Analyze this:\n\n{very_large_document}"
result = llm.invoke(prompt)
Fix it by chunking first:
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
chunks = splitter.split_text(very_large_document)
2) Your retriever or vector store is slow
A slow Pinecone, FAISS wrapper, Weaviate, or PostgreSQL query can make the whole chain look like an LLM timeout.
retriever = vectorstore.as_retriever(search_kwargs={"k": 20})
docs = retriever.invoke("policy exclusions")
Try reducing retrieval load:
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
Also check whether your vector DB client has its own timeout setting.
3) You’re doing synchronous calls inside a request path with no concurrency control
If you call multiple chains sequentially in a web request, one slow dependency blocks everything.
# Slow pattern: serial calls
a = chain1.invoke(input_data)
b = chain2.invoke(input_data)
c = chain3.invoke(input_data)
Use async where supported:
results = await asyncio.gather(
chain1.ainvoke(input_data),
chain2.ainvoke(input_data),
chain3.ainvoke(input_data),
)
4) Your server or platform has its own deadline
Your app might be timing out before LangChain finishes. Common examples are:
- •FastAPI/Uvicorn request timeouts
- •AWS Lambda max execution time
- •Gunicorn worker timeouts
- •API gateway deadlines
Example symptom:
- •LangChain logs show the request still running
- •Client gets
504 Gateway Timeout - •Your app never returns a Python exception from LangChain itself
Check infrastructure settings alongside application code.
How to Debug It
- •
Find the exact exception class
- •Look for
openai.APITimeoutError,httpx.ReadTimeout,requests.exceptions.Timeout, or a provider-specific error. - •If it’s a gateway error like
504, the problem may be outside Python.
- •Look for
- •
Isolate the failing component
- •Call the model directly with
.invoke(). - •Then test retrievers separately.
- •Then test the full chain.
- •This tells you whether the timeout is from LLM generation or upstream data fetching.
- •Call the model directly with
- •
Log timings around each step
- •Measure retrieval time, prompt construction time, and model call time.
- •Example:
import time
start = time.time()
docs = retriever.invoke(query)
print("retrieval:", time.time() - start)
start = time.time()
resp = llm.invoke(prompt)
print("llm:", time.time() - start)
- •Check provider and client timeout settings
- •Inspect
timeout,max_retries, and any SDK-level defaults. - •Also check reverse proxies, load balancers, and serverless limits.
- •Inspect
Prevention
- •
Set explicit timeouts on every external client:
- •LLM client
- •HTTP client
- •vector DB client
- •database driver
- •
Keep prompts small and retrieval targeted:
- •chunk documents before indexing
- •reduce
k - •avoid dumping entire PDFs into one prompt
- •
Add timing logs and retries:
- •log per-step latency
- •retry transient failures with backoff
- •alert when p95 latency starts creeping up
If you want stable LangChain systems in production, treat timeout handling as part of architecture, not as an afterthought. The fix is usually not “increase everything forever”; it’s finding which hop in the pipeline is slow and setting sane limits there.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit