How to Fix 'intermittent 500 errors' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-21
intermittent-500-errorslangchainpython

Intermittent 500 errors in LangChain usually mean the failure is not deterministic. The same prompt sometimes works and sometimes blows up because of rate limits, bad retries, shared mutable state, timeouts, or an upstream model/provider issue.

In practice, this shows up when you batch requests, run chains concurrently, reuse clients across threads, or call a flaky tool/API from inside an agent.

The Most Common Cause

The #1 cause is uncontrolled concurrency against the LLM provider or a tool endpoint.

With LangChain, people often wrap LLMChain, RunnableParallel, or an agent in async fan-out and then hit provider limits. The error may surface as openai.RateLimitError, httpx.ReadTimeout, or a generic langchain_core.exceptions.OutputParserException if retries return partial junk.

Broken vs fixed pattern

BrokenFixed
Fires too many requests at onceAdds bounded concurrency + retries
Reuses one chain across many tasks without limitsUses semaphore / max_concurrency
Treats all failures as application bugsDistinguishes transient provider errors
# BROKEN
import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_template("Summarize: {text}")
chain = prompt | llm

async def run_all(texts):
    # Unbounded fan-out
    tasks = [chain.ainvoke({"text": t}) for t in texts]
    return await asyncio.gather(*tasks)

# Under load this can surface as:
# openai.RateLimitError: Error code: 429
# httpx.ReadTimeout: timed out
# 500 Internal Server Error from upstream proxy
# FIXED
import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential_jitter
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_template("Summarize: {text}")
chain = prompt | llm

sem = asyncio.Semaphore(5)

@retry(stop=stop_after_attempt(3), wait=wait_exponential_jitter(1, 5))
async def safe_invoke(text):
    async with sem:
        return await chain.ainvoke({"text": text})

async def run_all(texts):
    return await asyncio.gather(*(safe_invoke(t) for t in texts))

If your “500” only happens under load, this is the first place to look.

Other Possible Causes

1) Bad tool output causing parser failures

Agents often fail intermittently because a tool returns malformed text and the agent parser can’t recover.

# Example error:
# langchain_core.exceptions.OutputParserException:
# Could not parse LLM output: "Final Answer: ..."

agent_executor.invoke({"input": "Look up account status"})

Fix by constraining tool output or switching to structured outputs:

from pydantic import BaseModel

class ToolResult(BaseModel):
    status: str
    balance: float

# Return JSON from the tool and validate it before passing back.

2) Timeout mismatch between LangChain and the provider

If your provider timeout is shorter than the actual request latency, you’ll get sporadic failures.

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o-mini",
    timeout=5,   # too aggressive for long prompts/tools
    max_retries=0,
)

Increase timeout and allow retries:

llm = ChatOpenAI(
    model="gpt-4o-mini",
    timeout=30,
    max_retries=3,
)

3) Shared mutable state in memory or callbacks

A common bug is reusing one memory object or callback handler across concurrent requests.

# BAD: shared state across requests
memory = ConversationBufferMemory()

def handle_request(user_input):
    chain = ConversationChain(llm=llm, memory=memory)
    return chain.invoke({"input": user_input})

Use per-request state:

def handle_request(user_input):
    memory = ConversationBufferMemory()
    chain = ConversationChain(llm=llm, memory=memory)
    return chain.invoke({"input": user_input})

4) Upstream API instability hidden behind a proxy

Sometimes LangChain is fine and your gateway is returning intermittent 500s.

curl -i https://your-proxy.example.com/v1/chat/completions

If the proxy logs show upstream resets or TLS issues, fix that layer first. LangChain will only report the symptom.

How to Debug It

  1. Capture the exact exception class

    • Don’t stop at “500 error”.
    • Look for openai.RateLimitError, httpx.ReadTimeout, APIConnectionError, or OutputParserException.
  2. Disable concurrency

    • Run one request at a time.
    • If the problem disappears, you have a load/retry/state issue.
  3. Log raw prompts and tool outputs

    • Print the final prompt sent to the model.
    • Inspect tool responses before they reach the agent.
    • Bad JSON or truncated text is a frequent trigger.
  4. Turn on LangChain tracing

    • Use LangSmith or verbose logging to see where failure starts.
    • Check whether the error happens in:
      • prompt formatting
      • model call
      • tool execution
      • output parsing

Example:

import logging
logging.basicConfig(level=logging.INFO)

chain.invoke({"text": "test"})

If you need deeper inspection, wrap each stage separately instead of calling a full agent end-to-end.

Prevention

  • Set explicit timeouts and retries on every external dependency.
  • Limit concurrency with semaphores or framework-level max_concurrency.
  • Make tools return structured data; don’t pass free-form strings back into parsers.
  • Keep memory and callback handlers request-scoped, not global.
  • Test under load with realistic prompt sizes before shipping.

If you’re seeing intermittent 500s in LangChain, assume it’s a systems problem first, not a LangChain bug. In most cases the fix is tightening concurrency, adding retries with backoff, and removing shared state from your request path.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides