How to Fix 'async event loop error when scaling' in LlamaIndex (Python)
What this error means
If you’re seeing RuntimeError: This event loop is already running or asyncio.run() cannot be called from a running event loop while scaling LlamaIndex code, you’ve hit an async lifecycle problem, not a LlamaIndex bug. It usually shows up when you mix sync and async calls, especially inside FastAPI, Jupyter, Streamlit, Celery workers, or any app that already owns the event loop.
In LlamaIndex, this often happens when calling async methods like index.as_query_engine().aquery() or RetrieverQueryEngine.aquery() from code that is already inside an active loop.
The Most Common Cause
The #1 cause is calling asyncio.run() inside a context that already has an event loop running. This is common when developers wrap LlamaIndex async APIs in helper functions and then call those helpers from FastAPI endpoints, notebooks, or other async services.
Here’s the broken pattern and the fixed pattern side by side:
| Broken | Fixed |
|---|---|
| ```python | |
| import asyncio | |
| from llama_index.core import VectorStoreIndex |
index = VectorStoreIndex.from_documents(docs)
def search(question: str): # BAD: blocks on asyncio.run() even if caller already has a loop return asyncio.run(index.as_query_engine().aquery(question))
Called from FastAPI / notebook / async worker
result = search("What is the policy?")
|python
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(docs)
async def search(question: str): # GOOD: await directly inside async code query_engine = index.as_query_engine() return await query_engine.aquery(question)
Called from async context
result = await search("What is the policy?")
The key rule is simple:
- Use `await` inside async functions.
- Use sync methods only in sync code.
- Don’t wrap LlamaIndex async calls with `asyncio.run()` unless you are at the top level of a standalone script.
A common real-world traceback looks like this:
```text
RuntimeError: asyncio.run() cannot be called from a running event loop
Or:
RuntimeError: This event loop is already running
If you see either one, inspect where the call chain crosses from sync into async.
Other Possible Causes
1) Calling .aquery() or .achat() from sync code without awaiting
LlamaIndex exposes async methods on classes like QueryEngine, RetrieverQueryEngine, and chat engines. If you call them like normal functions, you’ll get coroutine objects or runtime failures downstream.
# Broken
response = index.as_query_engine().aquery("Summarize the claim")
# Fixed
response = await index.as_query_engine().aquery("Summarize the claim")
If your function is not async, switch to the sync API:
# Fixed alternative
response = index.as_query_engine().query("Summarize the claim")
2) Mixing sync and async handlers in FastAPI
FastAPI route handlers can be def or async def. If your handler is sync but calls async LlamaIndex methods through asyncio.run(), you’ll eventually hit loop errors under load.
# Broken
from fastapi import FastAPI
import asyncio
app = FastAPI()
@app.get("/search")
def search():
return asyncio.run(query_engine.aquery("What happened?"))
Use an async route instead:
# Fixed
@app.get("/search")
async def search():
return await query_engine.aquery("What happened?")
3) Notebook environment already owns the loop
Jupyter and IPython run an event loop for you. That means top-level asyncio.run() will fail even if your code works as a script.
# Broken in Jupyter
import asyncio
result = asyncio.run(query_engine.aquery("Explain the document"))
Use top-level await instead:
# Fixed in Jupyter
result = await query_engine.aquery("Explain the document")
4) Background workers spawning nested loops
Some worker frameworks and schedulers create their own execution model. If you start another event loop inside them, you get nested-loop errors.
# Broken pattern in worker code
def process_job():
return asyncio.run(run_llamaindex_job())
Fix it by making the job entrypoint async if the framework supports it, or by isolating all LlamaIndex async work behind one event-loop owner:
# Fixed pattern
async def process_job():
return await run_llamaindex_job()
5) Using deprecated or mismatched integrations
Older integration code can hide loop management issues. For example, mixing old retriever wrappers with newer Settings, StorageContext, or OpenAI client configurations can create confusing failures that surface as event-loop problems.
Check for patterns like:
- •old custom wrappers around
ServiceContext - •third-party libraries that internally call
asyncio.get_event_loop().run_until_complete(...) - •incompatible versions of
llama-index,openai, or your web framework
How to Debug It
- •
Find the first place where async crosses into sync
- •Search for
asyncio.run,run_until_complete,.aquery(,.achat(, and.astream(. - •The bug is usually one layer above where the traceback ends.
- •Search for
- •
Check whether your caller already has an event loop
- •In FastAPI routes, notebooks, Streamlit callbacks, and some worker frameworks, assume there is already a running loop.
- •If yes, remove
asyncio.run()and useawait.
- •
Print the exact object type you are calling
- •Make sure you’re calling methods on classes like:
- •
QueryEngine - •
RetrieverQueryEngine - •
ChatEngine
- •
- •A lot of mistakes come from storing a coroutine instead of a response object.
- •Make sure you’re calling methods on classes like:
- •
Temporarily switch to sync APIs
- •Replace:
- •
.aquery()with.query() - •
.achat()with.chat()
- •
- •If the error disappears, your issue is almost certainly event-loop ownership rather than indexing logic.
- •Replace:
Prevention
- •
Keep one rule per boundary:
- •async handler → use LlamaIndex async APIs with
await - •sync script → use LlamaIndex sync APIs only
- •async handler → use LlamaIndex async APIs with
- •
Avoid hiding event-loop management inside utility functions.
- •If a function calls LlamaIndex async methods, make it explicitly
async def.
- •If a function calls LlamaIndex async methods, make it explicitly
- •
Add tests for both execution modes.
- •One test should run in plain Python.
- •Another should run under your web framework’s request lifecycle.
If you want scaling-safe behavior with LlamaIndex, treat asyncio as infrastructure. Own it once at the edge of your app, then pass control through with plain awaits.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit