How to Fix 'intermittent 500 errors in production' in LangChain (Python)
Intermittent 500 errors in a LangChain app usually mean your chain is fine most of the time, but one of the upstream calls is failing under specific conditions: rate limits, timeouts, bad inputs, or shared-state bugs. In production, this shows up when traffic increases, prompts get longer, or your app starts handling concurrent requests.
The key thing: 500 is rarely “LangChain is broken.” It’s usually your code wrapping a model call without enough retries, validation, or isolation.
The Most Common Cause
The #1 cause I see is unhandled exceptions from the LLM or tool call bubbling up through your API. In LangChain Python, that often looks like openai.RateLimitError, openai.APITimeoutError, httpx.ReadTimeout, or a tool exception getting wrapped into a generic 500 Internal Server Error.
Here’s the wrong pattern:
from fastapi import FastAPI
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
app = FastAPI()
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_template("Summarize this ticket: {text}")
chain = prompt | llm
@app.post("/summarize")
async def summarize(payload: dict):
# Wrong: no timeout handling, no retries, no exception mapping
result = await chain.ainvoke({"text": payload["text"]})
return {"summary": result.content}
And here’s the fixed pattern:
from fastapi import FastAPI, HTTPException
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from openai import RateLimitError, APITimeoutError
app = FastAPI()
llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0,
max_retries=3,
timeout=20,
)
prompt = ChatPromptTemplate.from_template("Summarize this ticket: {text}")
chain = prompt | llm
@app.post("/summarize")
async def summarize(payload: dict):
try:
result = await chain.ainvoke({"text": payload["text"]})
return {"summary": result.content}
except RateLimitError as e:
raise HTTPException(status_code=429, detail="LLM rate limited") from e
except APITimeoutError as e:
raise HTTPException(status_code=504, detail="LLM request timed out") from e
except Exception as e:
raise HTTPException(status_code=500, detail=f"Unexpected chain failure: {type(e).__name__}") from e
Why this matters:
- •Without retries and timeouts, transient provider issues become user-facing 500s.
- •Without exception mapping, every failure looks identical in logs.
- •Without boundaries around the chain call, one bad request can take down the whole handler.
Other Possible Causes
| Cause | What it looks like | Typical fix |
|---|---|---|
| Bad input shape | KeyError: 'text' or Pydantic validation errors | Validate request body before calling the chain |
| Tool failure | ToolException or custom exception from your function/tool | Catch tool errors and return a controlled response |
| Memory/state bleed | Random failures under concurrency | Avoid shared mutable memory per process |
| Provider payload limits | 400 BadRequestError or token overflow that gets surfaced as 500 by your API | Truncate input and enforce token budgets |
1) Bad input shape
If you pass missing keys into a prompt template or runnable, LangChain will fail hard.
# Broken
await chain.ainvoke({"message": "hello"}) # template expects {text}
# Fixed
payload_text = payload.get("text")
if not payload_text:
raise HTTPException(status_code=422, detail="Missing required field: text")
await chain.ainvoke({"text": payload_text})
2) Tool exceptions
This happens when you use Tool, StructuredTool, or an agent that calls external systems.
from langchain_core.tools import tool
@tool
def lookup_policy(policy_id: str) -> str:
# Broken: raw exception bubbles up to the API layer
if policy_id == "bad":
raise ValueError("Policy not found")
return "active"
Fix it by converting failures into controlled tool output:
@tool
def lookup_policy(policy_id: str) -> str:
try:
if policy_id == "bad":
raise ValueError("Policy not found")
return "active"
except ValueError as e:
return f"ERROR: {str(e)}"
3) Shared mutable memory
If you reuse conversation memory across requests, one user’s state can corrupt another’s.
# Broken: global state shared across all requests
memory_store = []
def add_message(msg):
memory_store.append(msg)
Use per-session storage instead:
def get_session_memory(session_id: str):
return session_cache.setdefault(session_id, [])
For LangChain agents, keep memory scoped to the request or session ID.
4) Token limit / oversized prompts
Long tickets, logs, or document chunks can push you over model limits.
# Broken: unbounded text passed straight into the prompt
result = await chain.ainvoke({"text": large_blob})
Trim before invocation:
MAX_CHARS = 12000
safe_text = large_blob[:MAX_CHARS]
result = await chain.ainvoke({"text": safe_text})
If you’re using retrieval chains, reduce chunk size and top-k before production rollout.
How to Debug It
- •
Log the exact exception class
- •Don’t stop at
500 Internal Server Error. - •Log
type(e).__name__, message text, and request ID. - •Look for classes like
RateLimitError,APITimeoutError,ValidationError,ToolException, orKeyError.
- •Don’t stop at
- •
Reproduce with one request
- •Take the exact failing payload from production logs.
- •Run it locally against the same chain.
- •If it only fails on long inputs or certain users, you’ve narrowed it down fast.
- •
Disable concurrency and isolate dependencies
- •Run a single worker.
- •Remove tools one by one.
- •Replace live APIs with mocks to see whether the failure is in LangChain orchestration or an upstream service.
- •
Turn on LangChain tracing
- •Use LangSmith or structured logging around each runnable step.
- •You want to know whether failure happens at prompt formatting, model invocation, parsing, or tool execution.
Example diagnostic wrapper:
import logging
logger = logging.getLogger(__name__)
async def run_chain(payload):
try:
return await chain.ainvoke(payload)
except Exception as e:
logger.exception(
"chain_failed",
extra={
"error_type": type(e).__name__,
"payload_keys": list(payload.keys()),
},
)
raise
Prevention
- •Set explicit
timeout,max_retries, and input validation on every LLM call. - •Never share mutable memory objects across requests; scope state by session ID.
- •Add tests for long inputs, missing fields, provider timeouts, and tool failures before shipping.
If you’re seeing intermittent 500s in production with LangChain Python, treat it like an integration problem first. The fix is usually boring engineering: validate inputs, catch exceptions at the boundary, add retries where they make sense, and keep state isolated per request.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit