How to Fix 'connection timeout' in LangGraph (Python)
What the error means
connection timeout in LangGraph usually means your graph tried to call an external service and never got a response before the client timeout expired. In Python, this often shows up when a node calls an LLM, a tool endpoint, a database, or a remote LangGraph server and the request hangs long enough to fail.
You’ll usually hit it during graph.invoke(...), graph.stream(...), or inside a node that wraps an HTTP client like httpx, requests, or an SDK built on top of them.
The Most Common Cause
The #1 cause is a node making a network call with no timeout handling, or with a timeout that is too short for the actual latency of the dependency.
In LangGraph, this often looks like a node calling an API directly and then failing with something like:
- •
httpx.ConnectTimeout - •
httpx.ReadTimeout - •
requests.exceptions.Timeout - •
TimeoutError: timed out
Broken vs fixed pattern
| Broken pattern | Fixed pattern |
|---|---|
| No explicit timeout, blocking call inside node | Explicit timeout + retry + fail fast |
| Node waits on remote dependency indefinitely | Node handles timeout and returns structured error |
| Graph invocation hangs until upstream client kills it | Graph surfaces controlled exception early |
# BROKEN
from langgraph.graph import StateGraph, START, END
import httpx
def fetch_customer(state):
# No timeout. This can hang until the process or upstream gateway kills it.
resp = httpx.get(f"https://api.example.com/customers/{state['customer_id']}")
return {"customer": resp.json()}
builder = StateGraph(dict)
builder.add_node("fetch_customer", fetch_customer)
builder.add_edge(START, "fetch_customer")
builder.add_edge("fetch_customer", END)
graph = builder.compile()
result = graph.invoke({"customer_id": "123"})
# FIXED
from langgraph.graph import StateGraph, START, END
import httpx
client = httpx.Client(timeout=httpx.Timeout(connect=5.0, read=15.0, write=5.0, pool=5.0))
def fetch_customer(state):
try:
resp = client.get(f"https://api.example.com/customers/{state['customer_id']}")
resp.raise_for_status()
return {"customer": resp.json()}
except httpx.TimeoutException as e:
return {"error": f"upstream_timeout: {type(e).__name__}"}
except httpx.HTTPStatusError as e:
return {"error": f"upstream_http_error: {e.response.status_code}"}
builder = StateGraph(dict)
builder.add_node("fetch_customer", fetch_customer)
builder.add_edge(START, "fetch_customer")
builder.add_edge("fetch_customer", END)
graph = builder.compile()
result = graph.invoke({"customer_id": "123"})
If you’re using an LLM wrapper inside the node, apply the same rule there. For example:
# Good pattern for model clients too
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(timeout=20) # don't rely on defaults
Other Possible Causes
1) Your LangGraph server is unreachable
If you’re using LangGraph Platform or calling a remote deployment from Python, the problem may be basic connectivity.
from langgraph_sdk import get_client
client = get_client(url="https://your-langgraph-server.example.com")
# If this hangs or times out, check DNS/VPN/firewall/proxy first.
threads = await client.threads.list()
Common symptoms:
- •
httpx.ConnectTimeout - •
httpcore.ConnectTimeout - •request never reaches your app logs
2) A tool call inside the graph is slow or blocked
A tool node can hide the real bottleneck if it waits on SQL, S3, internal HTTP APIs, or a queue.
def lookup_policy(state):
# Bad if this endpoint is slow and unbounded.
data = requests.get("https://internal-api/policies/42").json()
return {"policy": data}
Fix it with explicit timeouts:
def lookup_policy(state):
resp = requests.get(
"https://internal-api/policies/42",
timeout=(5, 15), # connect, read
)
resp.raise_for_status()
return {"policy": resp.json()}
3) You are running too much work inside one node
Nodes should do one bounded unit of work. If you cram retries, parsing, enrichment, and multiple API calls into one node, you increase the chance of hitting timeouts.
def giant_node(state):
customer = get_customer()
claims = get_claims()
summary = summarize_with_llm(customer, claims)
enrich(summary)
return {"summary": summary}
Split it:
def get_customer_node(state): ...
def get_claims_node(state): ...
def summarize_node(state): ...
Smaller nodes are easier to time-box and debug.
4) Your async code is blocking the event loop
If you mix sync I/O into async LangGraph nodes, requests can stall long enough to look like a connection issue.
# Bad inside async node
async def node(state):
data = requests.get("https://api.example.com/data").json() # blocks event loop
return {"data": data}
Use async clients instead:
import httpx
async def node(state):
async with httpx.AsyncClient(timeout=20.0) as client:
resp = await client.get("https://api.example.com/data")
resp.raise_for_status()
return {"data": resp.json()}
How to Debug It
- •
Check which layer is timing out
- •If you see
httpx.ConnectTimeout, it’s network/connectivity. - •If you see
httpx.ReadTimeout, the server accepted the connection but didn’t respond in time. - •If you see
requests.exceptions.Timeout, inspect every outbound call in that node.
- •If you see
- •
Log around each node
- •Add timestamps before and after every external call.
- •Print the exact node name so you know where execution stops.
import time
def debug_wrapper(fn):
def wrapped(state):
start = time.time()
print(f"start={fn.__name__}")
result = fn(state)
print(f"end={fn.__name__} elapsed={time.time() - start:.2f}s")
return result
return wrapped
- •
Run the dependency outside LangGraph
- •Call the same API from a plain Python script.
- •If it times out there too, LangGraph is not the root cause.
- •
Reduce concurrency and retries
- •Too many parallel branches can overload upstream services.
- •Temporarily set retries to zero and run one path only.
Prevention
- •Set explicit timeouts on every outbound client:
httpx,requests, OpenAI wrappers, DB drivers. - •Keep nodes small and single-purpose so one slow dependency doesn’t stall the whole graph.
- •Add structured logging around each node with elapsed time and exception type.
- •In production graphs, treat all network calls as unreliable and wrap them with retry plus fallback logic.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit