How to Fix 'chain execution stuck during development' in LlamaIndex (Python)
When you see chain execution stuck during development in a LlamaIndex Python app, it usually means your query pipeline never finishes because one step is waiting on something that never resolves. In practice, this shows up during local testing when you wire together an index, retriever, and query engine with a bad async pattern, a blocked callback, or a tool/LLM call that never returns.
The message is usually not the root cause. It’s the symptom you get when RetrieverQueryEngine, SubQuestionQueryEngine, or a custom QueryPipeline hangs before it can produce a response.
The Most Common Cause
The #1 cause I see is mixing sync and async execution incorrectly.
A common pattern is calling an async LlamaIndex method from synchronous code without awaiting it, or wrapping already-running event loop code in asyncio.run(). That can leave the chain half-started and looking “stuck.”
Broken vs fixed
| Broken pattern | Fixed pattern |
|---|---|
Calling async methods without await | Awaiting the coroutine properly |
Using asyncio.run() inside an environment that already has an event loop | Using await directly in async code |
| Returning coroutine objects into LlamaIndex components | Returning actual results |
# BROKEN
from llama_index.core import VectorStoreIndex
from llama_index.core import SimpleDirectoryReader
import asyncio
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
# This returns a coroutine if used with async APIs elsewhere
response = query_engine.aquery("What is in these documents?")
print(response) # <coroutine object ...>
# FIXED
import asyncio
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
async def main():
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = await query_engine.aquery("What is in these documents?")
print(response)
asyncio.run(main())
If you are inside Jupyter, FastAPI, or another running event loop, do not call asyncio.run(). Use await directly:
response = await query_engine.aquery("What is in these documents?")
Other Possible Causes
1. A blocking custom LLM wrapper
If your custom LLM class blocks on network I/O without timeouts, the chain appears stuck.
# BAD: no timeout, no fail-fast behavior
class MyLLM:
def complete(self, prompt: str):
return requests.post(
"https://api.example.com/generate",
json={"prompt": prompt}
).json()["text"]
Fix it with explicit timeouts:
class MyLLM:
def complete(self, prompt: str):
r = requests.post(
"https://api.example.com/generate",
json={"prompt": prompt},
timeout=30,
)
r.raise_for_status()
return r.json()["text"]
2. Callback handlers that deadlock
A bad callback handler can block the chain if it does expensive work synchronously inside on_event_start / on_event_end.
from llama_index.core.callbacks import CallbackManager, BaseCallbackHandler
class SlowHandler(BaseCallbackHandler):
def on_event_start(self, *args, **kwargs):
heavy_cpu_work() # blocks query execution
Move heavy work off-thread or queue it:
class FastHandler(BaseCallbackHandler):
def on_event_start(self, *args, **kwargs):
log_queue.put_nowait({"event": "start"})
3. Recursive tool calls in agents
Agents like ReActAgent or tool-using query engines can recurse forever if a tool keeps calling back into the same agent path.
# BAD: tool calls back into the same agent/query engine path
agent = ReActAgent.from_tools([recursive_tool], llm=llm)
Guard recursion depth or separate tool execution from agent orchestration:
agent = ReActAgent.from_tools(
[safe_tool],
llm=llm,
)
Also check for tools returning malformed outputs that force retries.
4. Empty or invalid retriever results
If your retriever returns unexpected objects instead of NodeWithScore, downstream components may keep retrying or fail silently depending on your setup.
# BAD: returning raw strings instead of nodes
return ["doc1", "doc2"]
Return proper nodes:
from llama_index.core.schema import NodeWithScore
return [
NodeWithScore(node=node_1, score=0.92),
NodeWithScore(node=node_2, score=0.81),
]
How to Debug It
- •
Isolate the failing layer
- •Call each component directly.
- •Test retrieval first:
nodes = retriever.retrieve("test query") print(nodes) - •Then test generation:
resp = await query_engine.aquery("test query") print(resp)
- •
Turn on verbose logging
- •LlamaIndex will often show where execution stops.
- •Use:
import logging logging.basicConfig(level=logging.DEBUG) - •Also inspect callback traces if you use
CallbackManager.
- •
Check for event loop misuse
- •If you see errors like:
- •
RuntimeError: asyncio.run() cannot be called from a running event loop - •
<coroutine object ...>printed instead of a response
- •
- •You are likely mixing sync and async APIs.
- •If you see errors like:
- •
Add hard timeouts around external calls
- •Wrap model calls, vector DB requests, and HTTP tools.
- •If the chain suddenly starts failing fast instead of hanging, you found the blocker.
Prevention
- •Keep async all the way through: use
aquery(),achat(), and properawaitsemantics consistently. - •Put timeouts on every external dependency: LLM API calls, vector store calls, and HTTP tools.
- •Keep callback handlers lightweight; push heavy work to queues or background workers.
- •Validate retriever outputs and tool outputs before passing them into agents or query engines.
If you still see the chain hanging after fixing async usage, look at your external service latency first. In most production LlamaIndex apps, “stuck during development” means one dependency has no timeout and no exit path.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit