How to Fix 'async event loop error in production' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21
async-event-loop-error-in-productionllamaindexpython

If you’re seeing RuntimeError: This event loop is already running or RuntimeError: asyncio.run() cannot be called from a running event loop in a LlamaIndex app, you’re hitting an async boundary problem, not a LlamaIndex bug. It usually shows up in FastAPI, Jupyter, Streamlit, Celery workers, or any service where you mix sync and async code incorrectly.

In production, this tends to happen when a synchronous wrapper calls asyncio.run() inside an environment that already owns the loop. LlamaIndex uses async heavily in classes like VectorStoreIndex, QueryEngine, and Retriever, so the fix is usually to stop forcing async work through sync entrypoints.

The Most Common Cause

The #1 cause is calling asyncio.run() inside code that is already executing in an event loop.

Typical examples:

  • FastAPI request handlers
  • Jupyter notebooks
  • Streamlit callbacks
  • Any framework that already manages the loop

Broken vs fixed pattern

Broken patternFixed pattern
Sync function wraps async call with asyncio.run()Make the caller async and await directly
Works locally in a scriptFails in production under an existing loop
# BROKEN
import asyncio
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

def build_and_query():
    docs = SimpleDirectoryReader("./data").load_data()
    index = VectorStoreIndex.from_documents(docs)

    # Raises:
    # RuntimeError: asyncio.run() cannot be called from a running event loop
    response = asyncio.run(index.as_query_engine().aquery("What is in these docs?"))
    return response
# FIXED
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

async def build_and_query():
    docs = SimpleDirectoryReader("./data").load_data()
    index = VectorStoreIndex.from_documents(docs)

    query_engine = index.as_query_engine()
    response = await query_engine.aquery("What is in these docs?")
    return response

If you’re inside FastAPI, keep the whole path async:

from fastapi import FastAPI
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

app = FastAPI()

@app.get("/search")
async def search():
    docs = SimpleDirectoryReader("./data").load_data()
    index = VectorStoreIndex.from_documents(docs)
    return await index.as_query_engine().aquery("Summarize the documents")

The rule is simple: if you call .aquery(), .aretrieve(), .achat(), or any async LlamaIndex method, your caller must also be async.

Other Possible Causes

1) Mixing sync and async LlamaIndex APIs

A common mistake is calling an async method from sync code through a helper that hides the problem.

# BROKEN
def ask_index(index):
    return index.as_query_engine().aquery("status?")  # returns coroutine, not result
# FIXED
async def ask_index(index):
    return await index.as_query_engine().aquery("status?")

If your function returns a coroutine object instead of text or a response object, you’re still on the wrong side of the boundary.

2) Running notebook-style code inside production workers

Jupyter and IPython often already run an event loop. Production systems like Celery can do similar things depending on worker setup.

# BROKEN in notebook / some workers
import asyncio
response = asyncio.run(query_engine.aquery("hello"))
# FIXED
response = await query_engine.aquery("hello")

If you must support both scripts and notebooks, split your API:

def main_sync():
    import asyncio
    return asyncio.run(main_async())

async def main_async():
    return await query_engine.aquery("hello")

3) Calling blocking code inside async handlers

Sometimes the error message is just the symptom. The real issue is blocking the loop with sync I/O while also trying to schedule async work.

# BAD: blocking file load + async query mixed carelessly
@app.get("/search")
async def search():
    docs = SimpleDirectoryReader("./data").load_data()  # blocking I/O
    index = VectorStoreIndex.from_documents(docs)
    return await index.as_query_engine().aquery("find invoices")

This may not always throw immediately, but under load it can create timing issues and nested-loop problems.

Use thread offloading for heavy sync work:

import anyio

@app.get("/search")
async def search():
    docs = await anyio.to_thread.run_sync(lambda: SimpleDirectoryReader("./data").load_data())
    index = VectorStoreIndex.from_documents(docs)
    return await index.as_query_engine().aquery("find invoices")

4) Reusing a closed or foreign event loop

This shows up when code stores an event loop globally and reuses it across requests or threads.

# BROKEN
loop = asyncio.get_event_loop()

def run_query(coro):
    return loop.run_until_complete(coro)

In production servers, loops are per-thread/per-process lifecycle objects. Don’t cache them globally.

# FIXED
async def run_query(coro):
    return await coro

How to Debug It

  1. Find the first place asyncio.run() appears

    • Search your codebase for:
      • asyncio.run(
      • run_until_complete(
      • .aquery(
      • .achat(
    • If asyncio.run() wraps a LlamaIndex coroutine inside request code, that’s usually the bug.
  2. Check whether your caller is already async

    • In FastAPI, Starlette, Jupyter, and many worker frameworks, you should not start a new loop.
    • If the function signature is def, but it calls .aquery(), that’s suspicious.
    • Convert it to async def and use await.
  3. Print the actual exception chain

    • Real errors often look like:
      • RuntimeError: This event loop is already running
      • RuntimeError: asyncio.run() cannot be called from a running event loop
      • RuntimeWarning: coroutine 'BaseQueryEngine.aquery' was never awaited
    • That last one means you returned a coroutine instead of awaiting it.
  4. Isolate LlamaIndex from your framework

    • Run the same logic in a plain Python script.
    • If it works there but fails in FastAPI/Streamlit/Jupyter, the framework owns the loop.
    • If it fails everywhere, your call chain is wrong.

Prevention

  • Keep one rule across the codebase: sync functions call sync APIs; async functions call async APIs.
  • For LlamaIndex query paths, prefer end-to-end async in web services:
    • await QueryEngine.aquery(...)
    • await Retriever.aretrieve(...)
  • Never hide event-loop management inside utility functions. If a helper needs to run coroutines, make that helper async too.
  • Add a small integration test that exercises your real deployment style:
    • FastAPI endpoint test for web apps
    • Notebook-style test if data scientists will run it interactively

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides