How to Fix 'chain execution stuck' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21
chain-execution-stuckllamaindexpython

What “chain execution stuck” usually means

If you’re seeing chain execution stuck in LlamaIndex, the chain started but never completed. In practice, that usually means one of three things: a tool call never returned, an async call was never awaited correctly, or your agent loop is waiting on a condition that can’t be satisfied.

This shows up most often with QueryEngine, ReActAgent, custom tools, and notebook code where sync and async APIs get mixed.

The Most Common Cause

The #1 cause is a tool or callback that blocks forever, or returns a value in the wrong format so the agent keeps waiting for a valid response.

A common pattern is passing a function that looks fine locally but doesn’t match what FunctionTool or QueryEngineTool expects.

Broken patternFixed pattern
Tool never returns a plain string / object the agent can useTool returns a deterministic value quickly
Async function called like sync codeUse await or asyncio.run() correctly
Infinite loop inside tool logicAdd timeout / exit condition

Broken code

from llama_index.core.tools import FunctionTool
from llama_index.core.agent import ReActAgent

def lookup_customer(customer_id: str):
    # BAD: blocking call with no timeout
    while True:
        result = fetch_from_legacy_system(customer_id)
        if result:
            return result
tool = FunctionTool.from_defaults(fn=lookup_customer)

agent = ReActAgent.from_tools([tool], verbose=True)
response = agent.chat("Get customer 12345")
print(response)

What happens here:

  • The tool can hang forever if fetch_from_legacy_system() never resolves.
  • The agent waits on the tool output, so you see behavior like:
    • chain execution stuck
    • repeated tool calls
    • no final response from ReActAgent

Fixed code

import time
from llama_index.core.tools import FunctionTool
from llama_index.core.agent import ReActAgent

def lookup_customer(customer_id: str):
    deadline = time.time() + 5  # hard timeout

    while time.time() < deadline:
        result = fetch_from_legacy_system(customer_id)
        if result:
            return str(result)

    return "Customer lookup timed out after 5 seconds"
tool = FunctionTool.from_defaults(fn=lookup_customer)

agent = ReActAgent.from_tools([tool], verbose=True)
response = agent.chat("Get customer 12345")
print(response)

The important part is that the tool always returns something finite and serializable. In production, wrap external calls with explicit timeouts and retries.

Other Possible Causes

1) Mixing sync and async APIs

This is common when using achat, aquery, or async agents inside a normal script.

# Broken
response = index.as_query_engine().aquery("What is this document about?")
print(response)  # coroutine never awaited
# Fixed
import asyncio

async def main():
    query_engine = index.as_query_engine()
    response = await query_engine.aquery("What is this document about?")
    print(response)

asyncio.run(main())

If you see warnings like:

  • RuntimeWarning: coroutine was never awaited
  • hanging behavior after calling aquery

this is usually the cause.

2) Recursive agent/tool loops

If your agent keeps calling the same tool without reaching an answer, it can look stuck.

# Broken: tool description encourages the model to keep querying itself
tool = FunctionTool.from_defaults(
    fn=lambda q: query_engine.query(q),
    name="search",
    description="Use this for everything"
)

Fix it by narrowing the tool’s purpose and adding guardrails:

tool = FunctionTool.from_defaults(
    fn=lambda q: query_engine.query(q),
    name="policy_search",
    description="Use only for policy document retrieval"
)

Also set sane iteration limits where supported by your agent setup.

3) Callback handler deadlock or bad instrumentation

Custom callbacks can block execution if they do network I/O or raise inside event hooks.

from llama_index.core.callbacks import CallbackManager

# Broken: custom handler does slow logging synchronously inside every event
callback_manager = CallbackManager([MySlowCallbackHandler()])

If your handler sends logs to an external API, move that work off-thread or buffer it. A callback should be fast and non-blocking.

4) Empty retrieval causing endless retries

When retrieval returns nothing, some agent setups keep trying different paths until they hit internal limits.

# Example config issue: too strict filters => zero nodes returned
retriever = index.as_retriever(similarity_top_k=2)
query_engine = index.as_query_engine(retriever=retriever)

If your corpus is small or filters are too aggressive, increase recall:

retriever = index.as_retriever(similarity_top_k=10)
query_engine = index.as_query_engine(retriever=retriever)

Also inspect whether metadata filters are excluding every node.

How to Debug It

  1. Turn on verbose tracing

    agent = ReActAgent.from_tools([tool], verbose=True)
    

    Watch whether it’s stuck on:

    • a tool call
    • retrieval
    • response synthesis
  2. Test each tool directly

    print(lookup_customer("12345"))
    

    If the function hangs outside LlamaIndex, the problem is not the agent.

  3. Check for async misuse

    • Search for .aquery(), .achat(), .astream()
    • Make sure every coroutine is awaited
    • Don’t call async methods from sync code without an event loop
  4. Reduce to one node, one tool Strip your setup down to:

    • one document
    • one retriever
    • one function tool

    If it works there, add components back until it breaks. That isolates the failing layer fast.

Prevention

  • Put hard timeouts on every external dependency: databases, HTTP APIs, queues, legacy systems.
  • Keep tools deterministic: return strings or structured JSON, not open-ended objects.
  • Add unit tests for each FunctionTool, especially around empty responses and timeout paths.
  • Use verbose mode during development so you can see whether the failure is in retrieval, planning, or execution.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides