How to Fix 'callback not firing in production' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21
callback-not-firing-in-productionllamaindexpython

What the error means

If you’re seeing callback not firing in production with LlamaIndex, the usual symptom is that your callback handler works locally but disappears once the app is deployed. In practice, this means your CallbackManager is not attached to the object that actually executes retrieval, indexing, or LLM calls.

The failure usually shows up as missing traces, empty telemetry, or a handler like on_event_start() never being called even though the query completes successfully.

The Most Common Cause

The #1 cause is building the Settings or ServiceContext with a callback manager in one place, then creating the index/query engine in another place without carrying that manager through.

In LlamaIndex Python, callbacks are not global magic. If the object doing the work does not receive the CallbackManager, your handler will never fire.

Wrong pattern vs right pattern

BrokenFixed
Callback manager created, but not passed into the active index/query engineCallback manager passed into Settings, and objects built after that inherit it
# BROKEN
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler

debug_handler = LlamaDebugHandler()
callback_manager = CallbackManager([debug_handler])

docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs)  # callback_manager never used

query_engine = index.as_query_engine()
response = query_engine.query("What is in these documents?")
print(response)
# FIXED
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler

debug_handler = LlamaDebugHandler()
callback_manager = CallbackManager([debug_handler])

Settings.callback_manager = callback_manager

docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs)

query_engine = index.as_query_engine()
response = query_engine.query("What is in these documents?")
print(response)

If you’re on an older LlamaIndex version using ServiceContext, the same rule applies:

# OLD STYLE
from llama_index.core import ServiceContext
service_context = ServiceContext.from_defaults(callback_manager=callback_manager)
index = VectorStoreIndex.from_documents(docs, service_context=service_context)

The key point: attach callbacks before building the components that emit events.

Other Possible Causes

1) You are using a different event path in production

A callback registered for query events won’t fire if production uses a different API path than local testing.

# Local test path
query_engine.query("hello")

# Production path might be different
chat_engine.chat("hello")

If your handler only listens for one flow, you’ll miss events from the other.

2) A nested component overrides your callback manager

Some integrations create their own internal CallbackManager. That can replace yours if you pass it only at the top level.

llm = OpenAI(
    model="gpt-4o-mini",
    callback_manager=callback_manager,
)

embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",
    callback_manager=callback_manager,
)

If one of those objects is created elsewhere without your manager, events may stop there. Check for hidden defaults in wrappers around VectorStoreIndex, retrievers, or custom tools.

3) Async code is swallowing exceptions before callbacks complete

If you use async query flows and forget to await them correctly, callbacks may never flush.

# Broken
result = query_engine.aquery("status")  # coroutine never awaited

# Fixed
result = await query_engine.aquery("status")

This often surfaces as no callback output rather than a clean stack trace. In production workers, this is common when mixing sync Flask handlers with async LlamaIndex calls.

4) Your production environment disables logging or stdout capture

Some handlers like LlamaDebugHandler print to stdout. In Docker, Gunicorn, or serverless runtimes, those logs may be buffered or dropped.

import logging

logging.basicConfig(level=logging.INFO)

# Prefer structured logging inside your custom handler instead of print-only debugging.

If your “missing callback” is really “callback fired but output not visible,” this is where to look.

How to Debug It

  1. Start with LlamaDebugHandler

    • Add it directly to CallbackManager.
    • Confirm whether any events appear during indexing and querying.
  2. Print the active manager before building objects

    • Verify Settings.callback_manager is set before VectorStoreIndex.from_documents(...).
    • If you use multiple modules, make sure nothing resets it later.
  3. Check which execution path runs in prod

    • Compare local vs production: query(), aquery(), chat(), agents, tools.
    • The class names matter: QueryEngine, ChatEngine, ReActAgent, and tool wrappers do not all emit identical callback events.
  4. Instrument a custom handler

    • Add explicit prints or structured logs in on_event_start() and on_event_end().
    • If start fires but end does not, you likely have an exception or cancellation mid-flight.
from llama_index.core.callbacks.base_handler import BaseCallbackHandler
from llama_index.core.callbacks.schema import CBEventType

class TraceHandler(BaseCallbackHandler):
    def on_event_start(self, event_type: CBEventType, payload=None, event_id="", parent_id="", **kwargs):
        print(f"START {event_type} {event_id}")

    def on_event_end(self, event_type: CBEventType, payload=None, event_id="", **kwargs):
        print(f"END {event_type} {event_id}")

Prevention

  • Set Settings.callback_manager once at app startup before constructing indexes, retrievers, or engines.
  • Use one explicit tracing path in production and test that exact path locally.
  • Add a smoke test that asserts at least one callback event fires for a sample query.
  • Prefer custom handlers with structured logging over stdout-only debug handlers in containerized deployments.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides