How to Fix 'chain execution stuck' in LlamaIndex (Python)
What “chain execution stuck” usually means
If you’re seeing chain execution stuck in LlamaIndex, the chain started but never completed. In practice, that usually means one of three things: a tool call never returned, an async call was never awaited correctly, or your agent loop is waiting on a condition that can’t be satisfied.
This shows up most often with QueryEngine, ReActAgent, custom tools, and notebook code where sync and async APIs get mixed.
The Most Common Cause
The #1 cause is a tool or callback that blocks forever, or returns a value in the wrong format so the agent keeps waiting for a valid response.
A common pattern is passing a function that looks fine locally but doesn’t match what FunctionTool or QueryEngineTool expects.
| Broken pattern | Fixed pattern |
|---|---|
| Tool never returns a plain string / object the agent can use | Tool returns a deterministic value quickly |
| Async function called like sync code | Use await or asyncio.run() correctly |
| Infinite loop inside tool logic | Add timeout / exit condition |
Broken code
from llama_index.core.tools import FunctionTool
from llama_index.core.agent import ReActAgent
def lookup_customer(customer_id: str):
# BAD: blocking call with no timeout
while True:
result = fetch_from_legacy_system(customer_id)
if result:
return result
tool = FunctionTool.from_defaults(fn=lookup_customer)
agent = ReActAgent.from_tools([tool], verbose=True)
response = agent.chat("Get customer 12345")
print(response)
What happens here:
- •The tool can hang forever if
fetch_from_legacy_system()never resolves. - •The agent waits on the tool output, so you see behavior like:
- •
chain execution stuck - •repeated tool calls
- •no final response from
ReActAgent
- •
Fixed code
import time
from llama_index.core.tools import FunctionTool
from llama_index.core.agent import ReActAgent
def lookup_customer(customer_id: str):
deadline = time.time() + 5 # hard timeout
while time.time() < deadline:
result = fetch_from_legacy_system(customer_id)
if result:
return str(result)
return "Customer lookup timed out after 5 seconds"
tool = FunctionTool.from_defaults(fn=lookup_customer)
agent = ReActAgent.from_tools([tool], verbose=True)
response = agent.chat("Get customer 12345")
print(response)
The important part is that the tool always returns something finite and serializable. In production, wrap external calls with explicit timeouts and retries.
Other Possible Causes
1) Mixing sync and async APIs
This is common when using achat, aquery, or async agents inside a normal script.
# Broken
response = index.as_query_engine().aquery("What is this document about?")
print(response) # coroutine never awaited
# Fixed
import asyncio
async def main():
query_engine = index.as_query_engine()
response = await query_engine.aquery("What is this document about?")
print(response)
asyncio.run(main())
If you see warnings like:
- •
RuntimeWarning: coroutine was never awaited - •hanging behavior after calling
aquery
this is usually the cause.
2) Recursive agent/tool loops
If your agent keeps calling the same tool without reaching an answer, it can look stuck.
# Broken: tool description encourages the model to keep querying itself
tool = FunctionTool.from_defaults(
fn=lambda q: query_engine.query(q),
name="search",
description="Use this for everything"
)
Fix it by narrowing the tool’s purpose and adding guardrails:
tool = FunctionTool.from_defaults(
fn=lambda q: query_engine.query(q),
name="policy_search",
description="Use only for policy document retrieval"
)
Also set sane iteration limits where supported by your agent setup.
3) Callback handler deadlock or bad instrumentation
Custom callbacks can block execution if they do network I/O or raise inside event hooks.
from llama_index.core.callbacks import CallbackManager
# Broken: custom handler does slow logging synchronously inside every event
callback_manager = CallbackManager([MySlowCallbackHandler()])
If your handler sends logs to an external API, move that work off-thread or buffer it. A callback should be fast and non-blocking.
4) Empty retrieval causing endless retries
When retrieval returns nothing, some agent setups keep trying different paths until they hit internal limits.
# Example config issue: too strict filters => zero nodes returned
retriever = index.as_retriever(similarity_top_k=2)
query_engine = index.as_query_engine(retriever=retriever)
If your corpus is small or filters are too aggressive, increase recall:
retriever = index.as_retriever(similarity_top_k=10)
query_engine = index.as_query_engine(retriever=retriever)
Also inspect whether metadata filters are excluding every node.
How to Debug It
- •
Turn on verbose tracing
agent = ReActAgent.from_tools([tool], verbose=True)Watch whether it’s stuck on:
- •a tool call
- •retrieval
- •response synthesis
- •
Test each tool directly
print(lookup_customer("12345"))If the function hangs outside LlamaIndex, the problem is not the agent.
- •
Check for async misuse
- •Search for
.aquery(),.achat(),.astream() - •Make sure every coroutine is awaited
- •Don’t call async methods from sync code without an event loop
- •Search for
- •
Reduce to one node, one tool Strip your setup down to:
- •one document
- •one retriever
- •one function tool
If it works there, add components back until it breaks. That isolates the failing layer fast.
Prevention
- •Put hard timeouts on every external dependency: databases, HTTP APIs, queues, legacy systems.
- •Keep tools deterministic: return strings or structured JSON, not open-ended objects.
- •Add unit tests for each
FunctionTool, especially around empty responses and timeout paths. - •Use verbose mode during development so you can see whether the failure is in retrieval, planning, or execution.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit