How to Fix 'chain execution stuck in production' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-21
chain-execution-stuck-in-productionlangchainpython

If your LangChain chain looks “stuck” in production, it usually means the request never reaches a terminal step, or it’s waiting on an external call that never returns. In practice, this shows up as hanging API requests, workers sitting at 100% latency, or traces that stop at an LLM/tool call with no final output.

The root cause is usually not LangChain itself. It’s almost always one of three things: a blocking tool call, an async/sync mismatch, or a chain that never gets fully invoked.

The Most Common Cause

The #1 cause I see is mixing sync and async execution incorrectly, especially inside FastAPI, Celery, or any server already running an event loop.

Typical symptom:

  • The request hangs forever
  • No exception is raised
  • You see RunnableSequence or LLMChain start in logs, but no completion
  • In some cases you’ll get: RuntimeError: This event loop is already running

Here’s the broken pattern.

BrokenFixed
Calling .run() / .predict() from async codeUsing .ainvoke() / .acall() consistently
Blocking the event loop with sync I/OKeeping the whole path async
Returning before awaiting the resultAwaiting the final runnable
# BROKEN: sync LangChain call inside async endpoint
from fastapi import FastAPI
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

app = FastAPI()

llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template("Summarize: {text}")
chain = prompt | llm

@app.post("/summarize")
async def summarize(payload: dict):
    # Hangs or causes event-loop issues in production
    result = chain.invoke({"text": payload["text"]})
    return {"summary": result.content}
# FIXED: async all the way through
from fastapi import FastAPI
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

app = FastAPI()

llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template("Summarize: {text}")
chain = prompt | llm

@app.post("/summarize")
async def summarize(payload: dict):
    result = await chain.ainvoke({"text": payload["text"]})
    return {"summary": result.content}

If you are using older APIs like LLMChain, the same rule applies:

# BROKEN
from langchain.chains import LLMChain

result = chain.run("hello")  # can block in async servers

# FIXED
result = await chain.acall({"input": "hello"})

The important part is consistency. If your request handler is async def, don’t sneak in sync LangChain calls unless you explicitly want to block the worker.

Other Possible Causes

1) A tool call never returns

This happens with AgentExecutor, custom tools, and HTTP-based tools. If your tool has no timeout, your agent will wait forever.

# BROKEN: no timeout on external request
import requests
from langchain_core.tools import tool

@tool
def fetch_customer(customer_id: str) -> str:
    return requests.get(f"https://api.internal/customers/{customer_id}").text
# FIXED: explicit timeout and error handling
import requests
from langchain_core.tools import tool

@tool
def fetch_customer(customer_id: str) -> str:
    resp = requests.get(
        f"https://api.internal/customers/{customer_id}",
        timeout=5,
    )
    resp.raise_for_status()
    return resp.text

2) Recursive agent loops

Agents can keep calling tools if your prompt lets them. In traces this often looks like repeated AgentExecutor steps with no final answer.

# BROKEN: weak stopping instructions can loop forever
agent_executor.invoke({"input": "Check customer status"})

Fix it by constraining iterations:

agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    max_iterations=3,
    early_stopping_method="generate",
)

3) Streaming callback consumer is blocked

If you stream tokens but your callback handler writes to a slow sink, the chain can appear stuck while waiting on I/O.

# BROKEN: slow synchronous logging inside token callback
class SlowHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs):
        save_to_db(token)  # blocks every token

# FIXED: buffer tokens and flush once
class BufferedHandler(BaseCallbackHandler):
    def __init__(self):
        self.tokens = []

    def on_llm_new_token(self, token: str, **kwargs):
        self.tokens.append(token)

4) Missing stop condition in a custom Runnable loop

If you built your own retry or orchestration logic around RunnableLambda, it’s easy to accidentally create an infinite loop.

# BROKEN
while True:
    output = chain.invoke(state)

Add a hard exit:

for _ in range(3):
    output = chain.invoke(state)
    if output["done"]:
        break
else:
    raise TimeoutError("Chain did not complete after 3 attempts")

How to Debug It

  1. Check whether you’re using sync calls inside async code

    • Search for .invoke(), .run(), .predict()
    • If the caller is async def, switch to .ainvoke() / .acall()
  2. Turn on LangChain tracing

    • Set LANGCHAIN_TRACING_V2=true
    • Inspect where execution stops: prompt formatting, model call, tool call, or post-processing
  3. Add timeouts around every external dependency

    • HTTP clients
    • Database calls in tools
    • File I/O in callbacks
      If one step has no timeout, that’s your stall point.
  4. Log before and after each runnable

    • Before prompt formatting
    • Before LLM invocation
    • Before each tool call
      If you never see the “after” log line, you found the blocking step.

Example:

print("before invoke")
result = await chain.ainvoke({"text": text})
print("after invoke")

If "before invoke" prints and "after invoke" doesn’t, the hang is inside LangChain execution or one of its dependencies.

Prevention

  • Use one execution style per service:
    • async web handlers → ainvoke(), acall()
    • sync workers → invoke(), run()
  • Put timeouts on every tool and network call.
  • Set guardrails on agents:
    • max_iterations
    • retry limits
    • explicit stop conditions

If you’re building this for production systems like banking or insurance workflows, treat every tool as an unreliable dependency until proven otherwise. Most “LangChain stuck” incidents are really orchestration bugs with missing timeouts and mixed execution models.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides