How to Fix 'callback not firing during development' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21
callback-not-firing-during-developmentllamaindexpython

If you’re seeing callback not firing during development in LlamaIndex, it usually means your callback handler is registered correctly in one place, but the code path you’re running never actually emits callback events. In practice, this shows up during local development when you switch between sync/async code, rebuild objects inside a request, or wire the handler to the wrong LlamaIndex component.

The error is rarely about the callback system itself. It’s usually about object lifetime, event propagation, or using a code path that bypasses the callback manager entirely.

The Most Common Cause

The #1 cause is creating the CallbackManager or handler in one place, then constructing the QueryEngine, Retriever, or Agent without passing that same manager through.

A common symptom is that your custom handler never sees events like CBEventType.RETRIEVE, CBEventType.SYNTHESIZE, or CBEventType.LLM. You’ll see no logs, no spans, and no errors — just silence.

Broken pattern vs fixed pattern

BrokenFixed
Handler created, but not attached to the engine/query pipelineSame handler passed into Settings or directly into the relevant constructor
Object rebuilt per request with default callback stateShared callback manager used consistently
# BROKEN
from llama_index.core import VectorStoreIndex
from llama_index.core.callbacks import CallbackManager
from llama_index.core.callbacks.base_handler import BaseCallbackHandler

class DebugHandler(BaseCallbackHandler):
    def on_event_start(self, event_type, payload=None, event_id="", parent_id="", **kwargs):
        print(f"start: {event_type}")

    def on_event_end(self, event_type, payload=None, event_id="", parent_id="", **kwargs):
        print(f"end: {event_type}")

handler = DebugHandler()
callback_manager = CallbackManager([handler])

index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()

response = query_engine.query("What is this document about?")
# FIXED
from llama_index.core import VectorStoreIndex, Settings
from llama_index.core.callbacks import CallbackManager
from llama_index.core.callbacks.base_handler import BaseCallbackHandler

class DebugHandler(BaseCallbackHandler):
    def on_event_start(self, event_type, payload=None, event_id="", parent_id="", **kwargs):
        print(f"start: {event_type}")

    def on_event_end(self, event_type, payload=None, event_id="", parent_id="", **kwargs):
        print(f"end: {event_type}")

handler = DebugHandler()
callback_manager = CallbackManager([handler])

Settings.callback_manager = callback_manager

index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()

response = query_engine.query("What is this document about?")

If you want tighter control per component, pass the manager explicitly:

query_engine = index.as_query_engine(callback_manager=callback_manager)

That avoids relying on global state and makes debugging much easier.

Other Possible Causes

1) You’re using async code but calling sync entrypoints

This happens when you use async retrievers or tools but invoke .query() instead of .aquery(), or run inside an event loop incorrectly. The callback chain can be incomplete depending on how your app is structured.

# Wrong
response = query_engine.query("Summarize this")

# Right
response = await query_engine.aquery("Summarize this")

If you’re inside FastAPI or another async framework, keep the whole path async.


2) Your custom handler does not implement the right methods

In LlamaIndex 0.10+, handlers usually extend BaseCallbackHandler and need the expected methods for start/end/event capture. If you only define a random method like on_llm_start, it won’t be called unless it matches the interface used by your version.

# Wrong: method names don't match what LlamaIndex calls
class MyHandler(BaseCallbackHandler):
    def on_llm_start(self, *args, **kwargs):
        print("LLM started")
# Right: implement the actual callback hooks used by your version
class MyHandler(BaseCallbackHandler):
    def on_event_start(self, event_type, payload=None, event_id="", parent_id="", **kwargs):
        print("start", event_type)

    def on_event_end(self, event_type, payload=None, event_id="", parent_id="", **kwargs):
        print("end", event_type)

Also confirm whether your installed version expects newer dispatcher-style callbacks versus older legacy hooks.


3) You’re using a code path that bypasses callbacks

Some direct low-level calls don’t emit the same events as higher-level engines. For example, calling an LLM client directly will not hit your LlamaIndex callback handler unless that client is wrapped by a component that forwards events.

# This bypasses LlamaIndex callbacks
result = llm.complete("Write a summary")

Use LlamaIndex components that are wired to the callback manager:

query_engine = index.as_query_engine(callback_manager=callback_manager)
result = query_engine.query("Write a summary")

If you need observability for raw model calls too, wrap them in a custom instrumented component.


4) Version mismatch between LlamaIndex packages

This one bites people constantly during development. If llama-index-core, integrations like OpenAI/Ollama/HuggingFace packages, and your callback code are from different release lines, handlers can silently stop firing or fail in odd ways.

Check your versions:

pip show llama-index-core llama-index-llms-openai llama-index-embeddings-openai

Then align them:

pip install -U llama-index-core llama-index-llms-openai llama-index-embeddings-openai

If you pinned one package months ago and upgraded another yesterday, assume mismatch first.

How to Debug It

  1. Confirm the handler is actually attached

    • Print the active manager before querying.
    • Check whether you set Settings.callback_manager or passed callback_manager= into the engine constructor.
  2. Add a minimal handler with explicit prints

    • Don’t start with tracing backends.
    • Use a tiny BaseCallbackHandler that prints every start/end event so you know whether anything fires at all.
  3. Reduce to one component

    • Test with just VectorStoreIndex -> as_query_engine -> query.
    • Remove agents, tools, rerankers, and custom retrievers until callbacks start working.
  4. Verify sync vs async path

    • If your app uses async def, call .aquery().
    • If callbacks work in sync mode but not async mode, your issue is likely around coroutine usage or framework integration.

Prevention

  • Set up a single shared CallbackManager early in app startup and inject it consistently.
  • Pin compatible LlamaIndex package versions together instead of upgrading them independently.
  • Write one smoke test that asserts at least one callback fires during a basic query:
    assert handler.events_seen > 0
    

If you treat callbacks as part of object wiring rather than logging sugar, this class of bug disappears fast. In production systems — especially bank and insurance workflows — silent observability failures are worse than loud ones.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides