How to Fix 'intermittent 500 errors' in LangChain (Python)
Intermittent 500 errors in LangChain usually mean the failure is not deterministic. The same prompt sometimes works and sometimes blows up because of rate limits, bad retries, shared mutable state, timeouts, or an upstream model/provider issue.
In practice, this shows up when you batch requests, run chains concurrently, reuse clients across threads, or call a flaky tool/API from inside an agent.
The Most Common Cause
The #1 cause is uncontrolled concurrency against the LLM provider or a tool endpoint.
With LangChain, people often wrap LLMChain, RunnableParallel, or an agent in async fan-out and then hit provider limits. The error may surface as openai.RateLimitError, httpx.ReadTimeout, or a generic langchain_core.exceptions.OutputParserException if retries return partial junk.
Broken vs fixed pattern
| Broken | Fixed |
|---|---|
| Fires too many requests at once | Adds bounded concurrency + retries |
| Reuses one chain across many tasks without limits | Uses semaphore / max_concurrency |
| Treats all failures as application bugs | Distinguishes transient provider errors |
# BROKEN
import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_template("Summarize: {text}")
chain = prompt | llm
async def run_all(texts):
# Unbounded fan-out
tasks = [chain.ainvoke({"text": t}) for t in texts]
return await asyncio.gather(*tasks)
# Under load this can surface as:
# openai.RateLimitError: Error code: 429
# httpx.ReadTimeout: timed out
# 500 Internal Server Error from upstream proxy
# FIXED
import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential_jitter
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_template("Summarize: {text}")
chain = prompt | llm
sem = asyncio.Semaphore(5)
@retry(stop=stop_after_attempt(3), wait=wait_exponential_jitter(1, 5))
async def safe_invoke(text):
async with sem:
return await chain.ainvoke({"text": text})
async def run_all(texts):
return await asyncio.gather(*(safe_invoke(t) for t in texts))
If your “500” only happens under load, this is the first place to look.
Other Possible Causes
1) Bad tool output causing parser failures
Agents often fail intermittently because a tool returns malformed text and the agent parser can’t recover.
# Example error:
# langchain_core.exceptions.OutputParserException:
# Could not parse LLM output: "Final Answer: ..."
agent_executor.invoke({"input": "Look up account status"})
Fix by constraining tool output or switching to structured outputs:
from pydantic import BaseModel
class ToolResult(BaseModel):
status: str
balance: float
# Return JSON from the tool and validate it before passing back.
2) Timeout mismatch between LangChain and the provider
If your provider timeout is shorter than the actual request latency, you’ll get sporadic failures.
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o-mini",
timeout=5, # too aggressive for long prompts/tools
max_retries=0,
)
Increase timeout and allow retries:
llm = ChatOpenAI(
model="gpt-4o-mini",
timeout=30,
max_retries=3,
)
3) Shared mutable state in memory or callbacks
A common bug is reusing one memory object or callback handler across concurrent requests.
# BAD: shared state across requests
memory = ConversationBufferMemory()
def handle_request(user_input):
chain = ConversationChain(llm=llm, memory=memory)
return chain.invoke({"input": user_input})
Use per-request state:
def handle_request(user_input):
memory = ConversationBufferMemory()
chain = ConversationChain(llm=llm, memory=memory)
return chain.invoke({"input": user_input})
4) Upstream API instability hidden behind a proxy
Sometimes LangChain is fine and your gateway is returning intermittent 500s.
curl -i https://your-proxy.example.com/v1/chat/completions
If the proxy logs show upstream resets or TLS issues, fix that layer first. LangChain will only report the symptom.
How to Debug It
- •
Capture the exact exception class
- •Don’t stop at “500 error”.
- •Look for
openai.RateLimitError,httpx.ReadTimeout,APIConnectionError, orOutputParserException.
- •
Disable concurrency
- •Run one request at a time.
- •If the problem disappears, you have a load/retry/state issue.
- •
Log raw prompts and tool outputs
- •Print the final prompt sent to the model.
- •Inspect tool responses before they reach the agent.
- •Bad JSON or truncated text is a frequent trigger.
- •
Turn on LangChain tracing
- •Use LangSmith or verbose logging to see where failure starts.
- •Check whether the error happens in:
- •prompt formatting
- •model call
- •tool execution
- •output parsing
Example:
import logging
logging.basicConfig(level=logging.INFO)
chain.invoke({"text": "test"})
If you need deeper inspection, wrap each stage separately instead of calling a full agent end-to-end.
Prevention
- •Set explicit timeouts and retries on every external dependency.
- •Limit concurrency with semaphores or framework-level
max_concurrency. - •Make tools return structured data; don’t pass free-form strings back into parsers.
- •Keep memory and callback handlers request-scoped, not global.
- •Test under load with realistic prompt sizes before shipping.
If you’re seeing intermittent 500s in LangChain, assume it’s a systems problem first, not a LangChain bug. In most cases the fix is tightening concurrency, adding retries with backoff, and removing shared state from your request path.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit