How to Fix 'callback not firing' in LangChain (Python)
What “callback not firing” usually means
In LangChain, this usually means your callback handler is registered, but the event you expect never reaches it. Most of the time, the chain runs fine and your handler stays silent because of async/sync mismatch, wrong callback attachment, or a runnable that doesn’t emit the event you’re listening for.
You’ll typically see this when using CallbackManager, BaseCallbackHandler, AsyncCallbackHandler, ChatOpenAI, RunnableSequence, or LLMChain and expecting on_llm_start, on_llm_new_token, or on_chain_end to fire.
The Most Common Cause
The #1 cause is mixing sync and async callbacks incorrectly.
If you pass an AsyncCallbackHandler into a sync .invoke() path, or define async def on_llm_new_token(...) but call a synchronous model method, LangChain won’t await your handler. The result is exactly what people describe as “callback not firing.”
Broken vs fixed pattern
| Broken | Fixed |
|---|---|
| Async handler used with sync invoke | Async handler used with async ainvoke |
| Callback passed in the wrong place | Callback passed via config={"callbacks": [...]} or constructor |
| Expecting token events from a non-streaming call | Enabling streaming and using the right method |
# BROKEN: async callback + sync invoke
import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.callbacks import AsyncCallbackHandler
class MyHandler(AsyncCallbackHandler):
async def on_llm_new_token(self, token: str, **kwargs) -> None:
print(f"TOKEN: {token}")
llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)
# This often results in no callback output because invoke() is synchronous.
result = llm.invoke("Write one sentence.", config={"callbacks": [MyHandler()]})
print(result)
# FIXED: async callback + async ainvoke
import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.callbacks import AsyncCallbackHandler
class MyHandler(AsyncCallbackHandler):
async def on_llm_new_token(self, token: str, **kwargs) -> None:
print(f"TOKEN: {token}")
async def main():
llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)
result = await llm.ainvoke(
"Write one sentence.",
config={"callbacks": [MyHandler()]},
)
print(result)
asyncio.run(main())
If you want sync code, use BaseCallbackHandler instead:
from langchain_openai import ChatOpenAI
from langchain_core.callbacks import BaseCallbackHandler
class MySyncHandler(BaseCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs) -> None:
print(f"TOKEN: {token}")
llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)
result = llm.invoke("Write one sentence.", config={"callbacks": [MySyncHandler()]})
Other Possible Causes
1) You attached callbacks to the wrong object
In LangChain 0.2+, many objects accept callbacks at different levels. If you attach them to a wrapper chain but the actual LLM call happens deeper down without propagation, your handler may never see events.
# Wrong: callback attached to a different layer than the actual runnable execution path
chain = prompt | llm
result = chain.invoke(
{"topic": "payments"},
config={"callbacks": [MyHandler()]},
)
# Better: ensure the runnable path actually propagates config/callbacks
result = chain.invoke(
{"topic": "payments"},
config={"callbacks": [MyHandler()], "tags": ["debug"]},
)
2) Streaming is off, so token callbacks never fire
on_llm_new_token only fires when tokens are streamed. If you call a normal completion method without streaming enabled, there are no intermediate tokens to report.
llm = ChatOpenAI(model="gpt-4o-mini", streaming=False) # no token stream
Fix:
llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)
If you only need final output, use on_llm_end or on_chain_end instead of token-level hooks.
3) You implemented the wrong callback method name
LangChain will not call arbitrary methods. If you define on_token() instead of on_llm_new_token(), nothing happens.
# Wrong
class MyHandler(BaseCallbackHandler):
def on_token(self, token: str, **kwargs):
print(token)
# Right
class MyHandler(BaseCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs):
print(token)
Common methods that actually fire:
- •
on_chain_start - •
on_chain_end - •
on_llm_start - •
on_llm_new_token - •
on_tool_start - •
on_tool_end - •
on_retriever_start - •
on_retriever_end
4) Your custom handler is being garbage-collected or never referenced
This shows up in notebooks and quick scripts where people instantiate inline and assume it stays alive. Keep a strong reference while debugging.
handler = MyHandler()
result = llm.invoke("hello", config={"callbacks": [handler]})
Also watch for environment-specific issues like Jupyter event loops or background threads swallowing exceptions inside handlers. If your handler throws inside on_llm_new_token, LangChain may log something like:
Error in CallbackManager.on_llm_new_token callback
That can look like “not firing” when it’s actually failing silently downstream.
How to Debug It
- •
Confirm the exact execution path
- •Are you calling
.invoke()or.ainvoke()? - •Are you using a streaming model?
- •Are you calling an LLM directly or through a chain/tool/retriever?
- •Are you calling
- •
Add all three core hooks
- •Implement
on_chain_start,on_llm_start, andon_llm_new_token. - •If only some fire, you’ve narrowed down where propagation stops.
- •Implement
from langchain_core.callbacks import BaseCallbackHandler
class DebugHandler(BaseCallbackHandler):
def on_chain_start(self, serialized, inputs, **kwargs):
print("CHAIN START")
def on_llm_start(self, serialized, prompts, **kwargs):
print("LLM START")
def on_llm_new_token(self, token: str, **kwargs):
print("TOKEN:", token)
- •
Remove wrappers
- •Test the model directly before testing chains.
- •Then add prompt templates.
- •Then add tools/retrievers.
- •The layer that breaks propagation is usually where the bug lives.
- •
Check logs for callback exceptions
- •Search for:
- •
Error in CallbackManager - •
Exception in callback - •async warnings like:
- •
RuntimeWarning: coroutine was never awaited - •
There is no current event loop in thread
- •
- •
- •Search for:
Prevention
- •Use sync handlers with sync calls and async handlers with async calls.
- •Turn on streaming only when you actually need token-level events.
- •Keep handlers simple and fail-fast; don’t do network I/O inside callbacks unless you queue it off-thread.
- •In production code, test both direct model invocation and full chain invocation so callback propagation gets covered before deployment.
If your callback isn’t firing in LangChain Python, start by checking whether you’re mixing .invoke() with an async handler. That’s the most common failure mode by far.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit