How to Fix 'intermittent 500 errors in production' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-21
intermittent-500-errors-in-productionlangchainpython

Intermittent 500 errors in a LangChain app usually mean your chain is fine most of the time, but one of the upstream calls is failing under specific conditions: rate limits, timeouts, bad inputs, or shared-state bugs. In production, this shows up when traffic increases, prompts get longer, or your app starts handling concurrent requests.

The key thing: 500 is rarely “LangChain is broken.” It’s usually your code wrapping a model call without enough retries, validation, or isolation.

The Most Common Cause

The #1 cause I see is unhandled exceptions from the LLM or tool call bubbling up through your API. In LangChain Python, that often looks like openai.RateLimitError, openai.APITimeoutError, httpx.ReadTimeout, or a tool exception getting wrapped into a generic 500 Internal Server Error.

Here’s the wrong pattern:

from fastapi import FastAPI
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

app = FastAPI()

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_template("Summarize this ticket: {text}")
chain = prompt | llm

@app.post("/summarize")
async def summarize(payload: dict):
    # Wrong: no timeout handling, no retries, no exception mapping
    result = await chain.ainvoke({"text": payload["text"]})
    return {"summary": result.content}

And here’s the fixed pattern:

from fastapi import FastAPI, HTTPException
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from openai import RateLimitError, APITimeoutError

app = FastAPI()

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    max_retries=3,
    timeout=20,
)
prompt = ChatPromptTemplate.from_template("Summarize this ticket: {text}")
chain = prompt | llm

@app.post("/summarize")
async def summarize(payload: dict):
    try:
        result = await chain.ainvoke({"text": payload["text"]})
        return {"summary": result.content}
    except RateLimitError as e:
        raise HTTPException(status_code=429, detail="LLM rate limited") from e
    except APITimeoutError as e:
        raise HTTPException(status_code=504, detail="LLM request timed out") from e
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Unexpected chain failure: {type(e).__name__}") from e

Why this matters:

  • Without retries and timeouts, transient provider issues become user-facing 500s.
  • Without exception mapping, every failure looks identical in logs.
  • Without boundaries around the chain call, one bad request can take down the whole handler.

Other Possible Causes

CauseWhat it looks likeTypical fix
Bad input shapeKeyError: 'text' or Pydantic validation errorsValidate request body before calling the chain
Tool failureToolException or custom exception from your function/toolCatch tool errors and return a controlled response
Memory/state bleedRandom failures under concurrencyAvoid shared mutable memory per process
Provider payload limits400 BadRequestError or token overflow that gets surfaced as 500 by your APITruncate input and enforce token budgets

1) Bad input shape

If you pass missing keys into a prompt template or runnable, LangChain will fail hard.

# Broken
await chain.ainvoke({"message": "hello"})  # template expects {text}

# Fixed
payload_text = payload.get("text")
if not payload_text:
    raise HTTPException(status_code=422, detail="Missing required field: text")
await chain.ainvoke({"text": payload_text})

2) Tool exceptions

This happens when you use Tool, StructuredTool, or an agent that calls external systems.

from langchain_core.tools import tool

@tool
def lookup_policy(policy_id: str) -> str:
    # Broken: raw exception bubbles up to the API layer
    if policy_id == "bad":
        raise ValueError("Policy not found")
    return "active"

Fix it by converting failures into controlled tool output:

@tool
def lookup_policy(policy_id: str) -> str:
    try:
        if policy_id == "bad":
            raise ValueError("Policy not found")
        return "active"
    except ValueError as e:
        return f"ERROR: {str(e)}"

3) Shared mutable memory

If you reuse conversation memory across requests, one user’s state can corrupt another’s.

# Broken: global state shared across all requests
memory_store = []

def add_message(msg):
    memory_store.append(msg)

Use per-session storage instead:

def get_session_memory(session_id: str):
    return session_cache.setdefault(session_id, [])

For LangChain agents, keep memory scoped to the request or session ID.

4) Token limit / oversized prompts

Long tickets, logs, or document chunks can push you over model limits.

# Broken: unbounded text passed straight into the prompt
result = await chain.ainvoke({"text": large_blob})

Trim before invocation:

MAX_CHARS = 12000
safe_text = large_blob[:MAX_CHARS]
result = await chain.ainvoke({"text": safe_text})

If you’re using retrieval chains, reduce chunk size and top-k before production rollout.

How to Debug It

  1. Log the exact exception class

    • Don’t stop at 500 Internal Server Error.
    • Log type(e).__name__, message text, and request ID.
    • Look for classes like RateLimitError, APITimeoutError, ValidationError, ToolException, or KeyError.
  2. Reproduce with one request

    • Take the exact failing payload from production logs.
    • Run it locally against the same chain.
    • If it only fails on long inputs or certain users, you’ve narrowed it down fast.
  3. Disable concurrency and isolate dependencies

    • Run a single worker.
    • Remove tools one by one.
    • Replace live APIs with mocks to see whether the failure is in LangChain orchestration or an upstream service.
  4. Turn on LangChain tracing

    • Use LangSmith or structured logging around each runnable step.
    • You want to know whether failure happens at prompt formatting, model invocation, parsing, or tool execution.

Example diagnostic wrapper:

import logging

logger = logging.getLogger(__name__)

async def run_chain(payload):
    try:
        return await chain.ainvoke(payload)
    except Exception as e:
        logger.exception(
            "chain_failed",
            extra={
                "error_type": type(e).__name__,
                "payload_keys": list(payload.keys()),
            },
        )
        raise

Prevention

  • Set explicit timeout, max_retries, and input validation on every LLM call.
  • Never share mutable memory objects across requests; scope state by session ID.
  • Add tests for long inputs, missing fields, provider timeouts, and tool failures before shipping.

If you’re seeing intermittent 500s in production with LangChain Python, treat it like an integration problem first. The fix is usually boring engineering: validate inputs, catch exceptions at the boundary, add retries where they make sense, and keep state isolated per request.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides