How to Fix 'callback not firing when scaling' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21
callback-not-firing-when-scalingllamaindexpython

When callback not firing when scaling shows up in a LlamaIndex Python app, it usually means your callback handler is attached in one execution path but not the one actually doing the work. In practice, this happens when you move from a single local query to async, multi-worker, or distributed execution and assume the same callback chain is still intact.

The symptom is usually one of these:

  • CallbackManager events never reach your handler
  • on_event_start / on_event_end never fire
  • traces work locally, then disappear once you add concurrency or scale out

The Most Common Cause

The #1 cause is creating the CallbackManager or handler in one place, but running the index/query in another process, thread, or instance that never received it.

This shows up a lot with:

  • asyncio.gather(...)
  • FastAPI background tasks
  • Celery / RQ workers
  • multiple Uvicorn/Gunicorn workers
  • object serialization across process boundaries

Broken pattern vs fixed pattern

BrokenFixed
callback handler created in the main process onlycallback handler passed into every worker/task
global state assumed to survive scalingexplicit dependency injection
index built with one manager, queried with anothersame manager instance used consistently
# BROKEN: callback exists only in the parent process
from llama_index.core import VectorStoreIndex, Settings
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler

debug_handler = LlamaDebugHandler()
Settings.callback_manager = CallbackManager([debug_handler])

def build_index(docs):
    # In a worker/process boundary, this may not inherit Settings as expected
    return VectorStoreIndex.from_documents(docs)

def run_query(index):
    # You expect callbacks here, but they may never hit debug_handler
    return index.as_query_engine().query("What is in the docs?")
# FIXED: pass callback manager explicitly through the code path
from llama_index.core import VectorStoreIndex
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler

debug_handler = LlamaDebugHandler()
callback_manager = CallbackManager([debug_handler])

def build_index(docs):
    return VectorStoreIndex.from_documents(
        docs,
        callback_manager=callback_manager,
    )

def run_query(index):
    query_engine = index.as_query_engine(callback_manager=callback_manager)
    return query_engine.query("What is in the docs?")

If you are using workers, do not rely on module-level globals. Instantiate the handler inside the worker startup path or pass config into each task.

Other Possible Causes

1) You attached the handler to the wrong object

Some LlamaIndex components accept callbacks at construction time, but others need them on the query engine or retriever.

# BROKEN
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
query_engine.query("foo")  # no callback manager attached here

# FIXED
query_engine = index.as_query_engine(callback_manager=callback_manager)
query_engine.query("foo")

2) Async code is swallowing exceptions before callbacks complete

If an async task fails early, you may see partial events or none at all.

# BROKEN
tasks = [engine.aquery(q) for q in queries]
results = await asyncio.gather(*tasks)  # one exception can short-circuit visibility

# FIXED
results = await asyncio.gather(*tasks, return_exceptions=True)
for result in results:
    if isinstance(result, Exception):
        logger.exception("Query failed", exc_info=result)

3) Your custom callback handler does not implement the right methods

In LlamaIndex, custom handlers should match the event interface. If your class misses required hooks like on_event_start and on_event_end, nothing useful gets recorded.

# BROKEN
from llama_index.core.callbacks.base import BaseCallbackHandler

class MyHandler(BaseCallbackHandler):
    def __init__(self):
        pass  # missing event methods / required init behavior

# FIXED
from llama_index.core.callbacks.base import BaseCallbackHandler, CBEventType

class MyHandler(BaseCallbackHandler):
    def __init__(self):
        super().__init__()

    def on_event_start(self, event_type: CBEventType, payload=None, event_id="", parent_id="", **kwargs):
        print(f"start: {event_type}")

    def on_event_end(self, event_type: CBEventType, payload=None, event_id="", parent_id="", **kwargs):
        print(f"end: {event_type}")

4) Worker processes don’t share memory with your main app

This is common with Gunicorn/Uvicorn workers. Each worker has its own Python process and its own Settings.

# Example: multiple workers mean multiple independent callback contexts
gunicorn app:app --workers 4

If your debug handler lives only in worker 1 during startup, worker 2/3/4 will never use it unless they initialize their own copy.

How to Debug It

  1. Confirm where the callback manager is attached

    • Print it right before query execution.
    • Verify both index creation and query execution use the same instance.
    print("index cb:", getattr(index._service_context if hasattr(index, "_service_context") else None))
    print("settings cb:", Settings.callback_manager)
    
  2. Use LlamaDebugHandler first

    • Strip out custom handlers.
    • If LlamaDebugHandler works but yours doesn’t, your implementation is wrong.
    • If neither works, your problem is attachment or process isolation.
  3. Check for process boundaries

    • Look for Celery tasks, background jobs, async queues, or multiple web workers.
    • If code runs outside the request process that set Settings.callback_manager, global state will not follow.
  4. Force a minimal synchronous repro

    • Run one document.
    • One query.
    • One process.
    • No background tasks.

    If callbacks fire there but fail under load, you have a scaling/context propagation issue rather than a LlamaIndex bug.

Prevention

  • Attach callbacks explicitly to each component that emits events:
    • index creation
    • retriever construction
    • query engine creation
  • Avoid relying on global Settings.callback_manager across worker boundaries.
  • Add a startup check that logs whether your active query engine has the expected callback manager before serving traffic.

If you want this to survive production load, treat callback wiring like database config: explicit per execution path, not ambient state. That’s the difference between “works on my laptop” and observability that still works at scale.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides