How to Fix 'chain execution stuck during development' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21

chain-execution-stuck-during-developmentllamaindexpython

When you see chain execution stuck during development in a LlamaIndex Python app, it usually means your query pipeline never finishes because one step is waiting on something that never resolves. In practice, this shows up during local testing when you wire together an index, retriever, and query engine with a bad async pattern, a blocked callback, or a tool/LLM call that never returns.

The message is usually not the root cause. It’s the symptom you get when RetrieverQueryEngine, SubQuestionQueryEngine, or a custom QueryPipeline hangs before it can produce a response.

The Most Common Cause

The #1 cause I see is mixing sync and async execution incorrectly.

A common pattern is calling an async LlamaIndex method from synchronous code without awaiting it, or wrapping already-running event loop code in asyncio.run(). That can leave the chain half-started and looking “stuck.”

Broken vs fixed

Broken pattern	Fixed pattern
Calling async methods without `await`	Awaiting the coroutine properly
Using `asyncio.run()` inside an environment that already has an event loop	Using `await` directly in async code
Returning coroutine objects into LlamaIndex components	Returning actual results

# BROKEN
from llama_index.core import VectorStoreIndex
from llama_index.core import SimpleDirectoryReader
import asyncio

documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

# This returns a coroutine if used with async APIs elsewhere
response = query_engine.aquery("What is in these documents?")
print(response)  # <coroutine object ...>

# FIXED
import asyncio
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

async def main():
    documents = SimpleDirectoryReader("./data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    query_engine = index.as_query_engine()

    response = await query_engine.aquery("What is in these documents?")
    print(response)

asyncio.run(main())

If you are inside Jupyter, FastAPI, or another running event loop, do not call asyncio.run(). Use await directly:

response = await query_engine.aquery("What is in these documents?")

Other Possible Causes

1. A blocking custom LLM wrapper

If your custom LLM class blocks on network I/O without timeouts, the chain appears stuck.

# BAD: no timeout, no fail-fast behavior
class MyLLM:
    def complete(self, prompt: str):
        return requests.post(
            "https://api.example.com/generate",
            json={"prompt": prompt}
        ).json()["text"]

Fix it with explicit timeouts:

class MyLLM:
    def complete(self, prompt: str):
        r = requests.post(
            "https://api.example.com/generate",
            json={"prompt": prompt},
            timeout=30,
        )
        r.raise_for_status()
        return r.json()["text"]

2. Callback handlers that deadlock

A bad callback handler can block the chain if it does expensive work synchronously inside on_event_start / on_event_end.

from llama_index.core.callbacks import CallbackManager, BaseCallbackHandler

class SlowHandler(BaseCallbackHandler):
    def on_event_start(self, *args, **kwargs):
        heavy_cpu_work()  # blocks query execution

Move heavy work off-thread or queue it:

class FastHandler(BaseCallbackHandler):
    def on_event_start(self, *args, **kwargs):
        log_queue.put_nowait({"event": "start"})

3. Recursive tool calls in agents

Agents like ReActAgent or tool-using query engines can recurse forever if a tool keeps calling back into the same agent path.

# BAD: tool calls back into the same agent/query engine path
agent = ReActAgent.from_tools([recursive_tool], llm=llm)

Guard recursion depth or separate tool execution from agent orchestration:

agent = ReActAgent.from_tools(
    [safe_tool],
    llm=llm,
)

Also check for tools returning malformed outputs that force retries.

4. Empty or invalid retriever results

If your retriever returns unexpected objects instead of NodeWithScore, downstream components may keep retrying or fail silently depending on your setup.

# BAD: returning raw strings instead of nodes
return ["doc1", "doc2"]

Return proper nodes:

from llama_index.core.schema import NodeWithScore

return [
    NodeWithScore(node=node_1, score=0.92),
    NodeWithScore(node=node_2, score=0.81),
]

How to Debug It

•

Isolate the failing layer

•Call each component directly.

•Test retrieval first:

nodes = retriever.retrieve("test query")
print(nodes)

•Then test generation:

resp = await query_engine.aquery("test query")
print(resp)

•
Turn on verbose logging
- •LlamaIndex will often show where execution stops.
- •
  Use:
```
import logging
logging.basicConfig(level=logging.DEBUG)
```
- •Also inspect callback traces if you use CallbackManager.
•
Check for event loop misuse
- •
  If you see errors like:
  - •RuntimeError: asyncio.run() cannot be called from a running event loop
  - •<coroutine object ...> printed instead of a response
- •You are likely mixing sync and async APIs.
•
Add hard timeouts around external calls
- •Wrap model calls, vector DB requests, and HTTP tools.
- •If the chain suddenly starts failing fast instead of hanging, you found the blocker.

Prevention

•Keep async all the way through: use aquery(), achat(), and proper await semantics consistently.
•Put timeouts on every external dependency: LLM API calls, vector store calls, and HTTP tools.
•Keep callback handlers lightweight; push heavy work to queues or background workers.
•Validate retriever outputs and tool outputs before passing them into agents or query engines.

If you still see the chain hanging after fixing async usage, look at your external service latency first. In most production LlamaIndex apps, “stuck during development” means one dependency has no timeout and no exit path.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit