How to Fix 'callback not firing' in LlamaIndex (Python)
What “callback not firing” usually means
In LlamaIndex, this error usually means you registered a callback handler, but the event never reached it. In practice, it shows up when you expect CallbackManager events like on_event_start / on_event_end, but your handler stays silent during indexing, retrieval, or LLM calls.
Most of the time, the problem is not LlamaIndex itself. It’s one of these: the callback manager was attached to the wrong object, the code path bypassed the instrumented component, or async execution prevented your handler from running where you expected.
The Most Common Cause
The #1 cause is attaching CallbackManager to a component that is not actually used by the query/index pipeline.
A common mistake is creating a custom CallbackManager, then passing it to one object while the real work happens in another object with its own default manager.
Broken vs fixed pattern
| Broken | Fixed |
|---|---|
Callback manager attached only to Settings, while the index/query engine uses separate defaults | Pass the same CallbackManager into the exact objects doing retrieval and synthesis |
# Broken
from llama_index.core import Settings, VectorStoreIndex
from llama_index.core.callbacks import CallbackManager, BaseCallbackHandler
class MyHandler(BaseCallbackHandler):
def on_event_start(self, event_type, payload, event_id, parent_id=None, **kwargs):
print("start:", event_type)
def on_event_end(self, event_type, payload, event_id, parent_id=None, **kwargs):
print("end:", event_type)
handler = MyHandler()
Settings.callback_manager = CallbackManager([handler])
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine() # may not use your expected manager
response = query_engine.query("What is in the docs?")
# Fixed
from llama_index.core import VectorStoreIndex
from llama_index.core.callbacks import CallbackManager, BaseCallbackHandler
class MyHandler(BaseCallbackHandler):
def on_event_start(self, event_type, payload, event_id, parent_id=None, **kwargs):
print("start:", event_type)
def on_event_end(self, event_type, payload, event_id, parent_id=None, **kwargs):
print("end:", event_type)
handler = MyHandler()
cb_manager = CallbackManager([handler])
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine(callback_manager=cb_manager)
response = query_engine.query("What is in the docs?")
If you are building more than one layer — for example StorageContext, retriever, query engine — pass the same callback manager through all of them. Don’t assume Settings.callback_manager will always cover every code path.
Other Possible Causes
1) Your handler does not implement the right callback interface
LlamaIndex expects a proper subclass of BaseCallbackHandler. If you just define random methods with similar names, they won’t be called.
# Broken
class MyHandler:
def on_event_start(self, *args, **kwargs):
print("start")
# Fixed
from llama_index.core.callbacks import BaseCallbackHandler
class MyHandler(BaseCallbackHandler):
def on_event_start(self, event_type, payload, event_id, parent_id=None, **kwargs):
print("start", event_type)
If you’re using an older version of LlamaIndex code from a blog post or gist, check method signatures carefully. Callback APIs have changed across releases.
2) You are running async code but only implemented sync hooks
Some flows use async internals. If your handler only prints in sync methods but your pipeline runs through async paths like aquery() or async ingestion flows`, you may think callbacks are broken when they are not.
# Broken
response = await query_engine.aquery("hello")
# handler only implements sync methods
# Fixed
class MyHandler(BaseCallbackHandler):
def on_event_start(self, event_type, payload, event_id, parent_id=None, **kwargs):
print("sync start", event_type)
async def aon_event_start(self, event_type, payload=None):
print("async start", event_type)
Not every custom handler needs both sync and async methods in every version, but if your workflow is async and nothing fires, inspect whether your handler supports the async callbacks used by your installed LlamaIndex version.
3) The code path you are testing does not emit that callback
Some operations do not trigger every callback type. For example:
- •A cached response may skip parts of retrieval/LLM execution.
- •A direct document lookup may bypass vector retrieval.
- •A custom retriever may not emit standard events unless instrumented properly.
# Example: cached path may short-circuit work
response = query_engine.query("same question as before")
If you expect token-level or LLM events but you are hitting cache hits or trivial retrieval paths instead، there may be nothing to fire.
4) Version mismatch between LlamaIndex packages
LlamaIndex split into multiple packages over time. Mixing versions can produce behavior where imports work but callbacks behave oddly.
Check for mismatched installs:
pip show llama-index llama-index-core llama-index-llms-openai llama-index-embeddings-openai
Typical fix:
pip install -U llama-index-core llama-index-llms-openai llama-index-embeddings-openai
If your project has old imports like:
from gpt_index import ...
you are on legacy code and should update it. Old examples often reference outdated callback APIs and class names.
How to Debug It
- •
Verify the handler is actually attached
- •Print the manager at runtime.
- •Confirm the exact object used by
VectorStoreIndex, retriever, or query engine has your callback manager.
- •
Add a minimal handler that logs every start/end
- •Keep it simple.
- •If this doesn’t fire either:
- •your handler isn’t wired in,
- •or that code path does not emit callbacks.
- •
Test both sync and async paths
- •Run
query()and thenaquery(). - •Compare behavior.
- •If only one path works after switching handlers/methods correctly، you found an async mismatch.
- •Run
- •
Check package versions and imports
- •Run:
pip freeze | grep llama-index - •Make sure all relevant packages are aligned.
- •Remove stale imports from older tutorials.
- •Run:
Prevention
- •Pass one shared
CallbackManagerthrough the actual runtime objects:- •index creation,
- •retriever,
- •query engine,
- •agent/tool wrappers.
- •Write a tiny smoke test for callbacks in CI:
- •create a dummy handler,
- •run one query,
- •assert at least one start/end event fired.
- •Pin compatible LlamaIndex versions in production:
llama-index-core==0.x.y llama-index-llms-openai==0.x.y llama-index-embeddings-openai==0.x.y
If you hit "callback not firing" in LlamaIndex Python again after wiring everything correctly، assume one of two things first: wrong object got instrumented or wrong execution path got exercised. That covers most real-world cases I see in production systems.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit