How to Fix 'async event loop error when scaling' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21
async-event-loop-error-when-scalingllamaindexpython

What this error means

If you’re seeing RuntimeError: This event loop is already running or asyncio.run() cannot be called from a running event loop while scaling LlamaIndex code, you’ve hit an async lifecycle problem, not a LlamaIndex bug. It usually shows up when you mix sync and async calls, especially inside FastAPI, Jupyter, Streamlit, Celery workers, or any app that already owns the event loop.

In LlamaIndex, this often happens when calling async methods like index.as_query_engine().aquery() or RetrieverQueryEngine.aquery() from code that is already inside an active loop.

The Most Common Cause

The #1 cause is calling asyncio.run() inside a context that already has an event loop running. This is common when developers wrap LlamaIndex async APIs in helper functions and then call those helpers from FastAPI endpoints, notebooks, or other async services.

Here’s the broken pattern and the fixed pattern side by side:

BrokenFixed
```python
import asyncio
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(docs)

def search(question: str): # BAD: blocks on asyncio.run() even if caller already has a loop return asyncio.run(index.as_query_engine().aquery(question))

Called from FastAPI / notebook / async worker

result = search("What is the policy?") |python from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(docs)

async def search(question: str): # GOOD: await directly inside async code query_engine = index.as_query_engine() return await query_engine.aquery(question)

Called from async context

result = await search("What is the policy?")


The key rule is simple:

- Use `await` inside async functions.
- Use sync methods only in sync code.
- Don’t wrap LlamaIndex async calls with `asyncio.run()` unless you are at the top level of a standalone script.

A common real-world traceback looks like this:

```text
RuntimeError: asyncio.run() cannot be called from a running event loop

Or:

RuntimeError: This event loop is already running

If you see either one, inspect where the call chain crosses from sync into async.

Other Possible Causes

1) Calling .aquery() or .achat() from sync code without awaiting

LlamaIndex exposes async methods on classes like QueryEngine, RetrieverQueryEngine, and chat engines. If you call them like normal functions, you’ll get coroutine objects or runtime failures downstream.

# Broken
response = index.as_query_engine().aquery("Summarize the claim")

# Fixed
response = await index.as_query_engine().aquery("Summarize the claim")

If your function is not async, switch to the sync API:

# Fixed alternative
response = index.as_query_engine().query("Summarize the claim")

2) Mixing sync and async handlers in FastAPI

FastAPI route handlers can be def or async def. If your handler is sync but calls async LlamaIndex methods through asyncio.run(), you’ll eventually hit loop errors under load.

# Broken
from fastapi import FastAPI
import asyncio

app = FastAPI()

@app.get("/search")
def search():
    return asyncio.run(query_engine.aquery("What happened?"))

Use an async route instead:

# Fixed
@app.get("/search")
async def search():
    return await query_engine.aquery("What happened?")

3) Notebook environment already owns the loop

Jupyter and IPython run an event loop for you. That means top-level asyncio.run() will fail even if your code works as a script.

# Broken in Jupyter
import asyncio
result = asyncio.run(query_engine.aquery("Explain the document"))

Use top-level await instead:

# Fixed in Jupyter
result = await query_engine.aquery("Explain the document")

4) Background workers spawning nested loops

Some worker frameworks and schedulers create their own execution model. If you start another event loop inside them, you get nested-loop errors.

# Broken pattern in worker code
def process_job():
    return asyncio.run(run_llamaindex_job())

Fix it by making the job entrypoint async if the framework supports it, or by isolating all LlamaIndex async work behind one event-loop owner:

# Fixed pattern
async def process_job():
    return await run_llamaindex_job()

5) Using deprecated or mismatched integrations

Older integration code can hide loop management issues. For example, mixing old retriever wrappers with newer Settings, StorageContext, or OpenAI client configurations can create confusing failures that surface as event-loop problems.

Check for patterns like:

  • old custom wrappers around ServiceContext
  • third-party libraries that internally call asyncio.get_event_loop().run_until_complete(...)
  • incompatible versions of llama-index, openai, or your web framework

How to Debug It

  1. Find the first place where async crosses into sync

    • Search for asyncio.run, run_until_complete, .aquery(, .achat(, and .astream(.
    • The bug is usually one layer above where the traceback ends.
  2. Check whether your caller already has an event loop

    • In FastAPI routes, notebooks, Streamlit callbacks, and some worker frameworks, assume there is already a running loop.
    • If yes, remove asyncio.run() and use await.
  3. Print the exact object type you are calling

    • Make sure you’re calling methods on classes like:
      • QueryEngine
      • RetrieverQueryEngine
      • ChatEngine
    • A lot of mistakes come from storing a coroutine instead of a response object.
  4. Temporarily switch to sync APIs

    • Replace:
      • .aquery() with .query()
      • .achat() with .chat()
    • If the error disappears, your issue is almost certainly event-loop ownership rather than indexing logic.

Prevention

  • Keep one rule per boundary:

    • async handler → use LlamaIndex async APIs with await
    • sync script → use LlamaIndex sync APIs only
  • Avoid hiding event-loop management inside utility functions.

    • If a function calls LlamaIndex async methods, make it explicitly async def.
  • Add tests for both execution modes.

    • One test should run in plain Python.
    • Another should run under your web framework’s request lifecycle.

If you want scaling-safe behavior with LlamaIndex, treat asyncio as infrastructure. Own it once at the edge of your app, then pass control through with plain awaits.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides