How to Fix 'callback not firing in production' in LlamaIndex (Python)
What the error means
If you’re seeing callback not firing in production with LlamaIndex, the usual symptom is that your callback handler works locally but disappears once the app is deployed. In practice, this means your CallbackManager is not attached to the object that actually executes retrieval, indexing, or LLM calls.
The failure usually shows up as missing traces, empty telemetry, or a handler like on_event_start() never being called even though the query completes successfully.
The Most Common Cause
The #1 cause is building the Settings or ServiceContext with a callback manager in one place, then creating the index/query engine in another place without carrying that manager through.
In LlamaIndex Python, callbacks are not global magic. If the object doing the work does not receive the CallbackManager, your handler will never fire.
Wrong pattern vs right pattern
| Broken | Fixed |
|---|---|
| Callback manager created, but not passed into the active index/query engine | Callback manager passed into Settings, and objects built after that inherit it |
# BROKEN
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
debug_handler = LlamaDebugHandler()
callback_manager = CallbackManager([debug_handler])
docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs) # callback_manager never used
query_engine = index.as_query_engine()
response = query_engine.query("What is in these documents?")
print(response)
# FIXED
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
debug_handler = LlamaDebugHandler()
callback_manager = CallbackManager([debug_handler])
Settings.callback_manager = callback_manager
docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
response = query_engine.query("What is in these documents?")
print(response)
If you’re on an older LlamaIndex version using ServiceContext, the same rule applies:
# OLD STYLE
from llama_index.core import ServiceContext
service_context = ServiceContext.from_defaults(callback_manager=callback_manager)
index = VectorStoreIndex.from_documents(docs, service_context=service_context)
The key point: attach callbacks before building the components that emit events.
Other Possible Causes
1) You are using a different event path in production
A callback registered for query events won’t fire if production uses a different API path than local testing.
# Local test path
query_engine.query("hello")
# Production path might be different
chat_engine.chat("hello")
If your handler only listens for one flow, you’ll miss events from the other.
2) A nested component overrides your callback manager
Some integrations create their own internal CallbackManager. That can replace yours if you pass it only at the top level.
llm = OpenAI(
model="gpt-4o-mini",
callback_manager=callback_manager,
)
embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
callback_manager=callback_manager,
)
If one of those objects is created elsewhere without your manager, events may stop there. Check for hidden defaults in wrappers around VectorStoreIndex, retrievers, or custom tools.
3) Async code is swallowing exceptions before callbacks complete
If you use async query flows and forget to await them correctly, callbacks may never flush.
# Broken
result = query_engine.aquery("status") # coroutine never awaited
# Fixed
result = await query_engine.aquery("status")
This often surfaces as no callback output rather than a clean stack trace. In production workers, this is common when mixing sync Flask handlers with async LlamaIndex calls.
4) Your production environment disables logging or stdout capture
Some handlers like LlamaDebugHandler print to stdout. In Docker, Gunicorn, or serverless runtimes, those logs may be buffered or dropped.
import logging
logging.basicConfig(level=logging.INFO)
# Prefer structured logging inside your custom handler instead of print-only debugging.
If your “missing callback” is really “callback fired but output not visible,” this is where to look.
How to Debug It
- •
Start with
LlamaDebugHandler- •Add it directly to
CallbackManager. - •Confirm whether any events appear during indexing and querying.
- •Add it directly to
- •
Print the active manager before building objects
- •Verify
Settings.callback_manageris set beforeVectorStoreIndex.from_documents(...). - •If you use multiple modules, make sure nothing resets it later.
- •Verify
- •
Check which execution path runs in prod
- •Compare local vs production:
query(),aquery(),chat(), agents, tools. - •The class names matter:
QueryEngine,ChatEngine,ReActAgent, and tool wrappers do not all emit identical callback events.
- •Compare local vs production:
- •
Instrument a custom handler
- •Add explicit prints or structured logs in
on_event_start()andon_event_end(). - •If start fires but end does not, you likely have an exception or cancellation mid-flight.
- •Add explicit prints or structured logs in
from llama_index.core.callbacks.base_handler import BaseCallbackHandler
from llama_index.core.callbacks.schema import CBEventType
class TraceHandler(BaseCallbackHandler):
def on_event_start(self, event_type: CBEventType, payload=None, event_id="", parent_id="", **kwargs):
print(f"START {event_type} {event_id}")
def on_event_end(self, event_type: CBEventType, payload=None, event_id="", **kwargs):
print(f"END {event_type} {event_id}")
Prevention
- •Set
Settings.callback_manageronce at app startup before constructing indexes, retrievers, or engines. - •Use one explicit tracing path in production and test that exact path locally.
- •Add a smoke test that asserts at least one callback event fires for a sample query.
- •Prefer custom handlers with structured logging over stdout-only debug handlers in containerized deployments.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit