How to Fix 'callback not firing during development' in LangChain (Python)
If your LangChain callback works in production but not during local development, the usual problem is not the callback handler itself. It’s almost always that the async/sync execution path does not match the callback API you implemented, so LangChain never reaches your handler.
This shows up a lot with AsyncCallbackHandler, Runnable chains, and streaming setups where you expect on_llm_new_token() or on_chain_end() to fire, but nothing prints and no exception is raised.
The Most Common Cause
The #1 cause is mixing synchronous invocation with async callbacks, or registering an async handler but calling the chain with a sync method like .invoke().
LangChain will happily run the chain, but your async callback methods such as on_llm_new_token or on_chain_end won’t fire the way you expect unless you use the async execution path.
Broken vs fixed
| Broken pattern | Fixed pattern |
|---|---|
Uses AsyncCallbackHandler but calls .invoke() | Uses AsyncCallbackHandler with .ainvoke() |
| Expects token streaming without enabling streaming | Enables streaming on the model and uses async run path |
# BROKEN
import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.callbacks import AsyncCallbackHandler
from langchain_core.prompts import ChatPromptTemplate
class DebugHandler(AsyncCallbackHandler):
async def on_llm_new_token(self, token: str, **kwargs):
print(f"TOKEN: {token}")
prompt = ChatPromptTemplate.from_template("Write a short haiku about {topic}")
llm = ChatOpenAI(model="gpt-4o-mini") # streaming not enabled
chain = prompt | llm
# This will NOT trigger async token callbacks as expected
result = chain.invoke({"topic": "rain"})
print(result)
# FIXED
import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.callbacks import AsyncCallbackHandler
from langchain_core.prompts import ChatPromptTemplate
class DebugHandler(AsyncCallbackHandler):
async def on_llm_new_token(self, token: str, **kwargs):
print(f"TOKEN: {token}")
prompt = ChatPromptTemplate.from_template("Write a short haiku about {topic}")
llm = ChatOpenAI(
model="gpt-4o-mini",
streaming=True,
callbacks=[DebugHandler()],
)
chain = prompt | llm
async def main():
result = await chain.ainvoke({"topic": "rain"})
print(result)
asyncio.run(main())
If you’re using ConversationChain, LLMChain, or any older chain class, the same rule applies: sync entrypoints call sync callbacks; async handlers need async entrypoints.
Other Possible Causes
1) You attached callbacks in the wrong place
In LangChain v0.2+, callbacks can be attached at different levels. If you attach them to the chain but the model is actually doing the work elsewhere, your handler may never see events.
# BROKEN: callbacks on chain only, model created elsewhere
llm = ChatOpenAI(model="gpt-4o-mini")
chain = prompt | llm.with_config({"callbacks": [DebugHandler()]})
# FIXED: attach callbacks directly to the runnable/model that emits events
llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[DebugHandler()])
chain = prompt | llm
2) Streaming is off, so token callbacks never fire
If you’re waiting for on_llm_new_token, you need a streaming-capable model configuration. Without streaming, LangChain returns only the final message.
# BROKEN
llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[DebugHandler()])
# FIXED
llm = ChatOpenAI(
model="gpt-4o-mini",
streaming=True,
callbacks=[DebugHandler()],
)
3) Your callback method name or signature is wrong
LangChain won’t call a method just because it looks close. If you implement on_chain_finish instead of on_chain_end, nothing happens.
# BROKEN
from langchain_core.callbacks import BaseCallbackHandler
class MyHandler(BaseCallbackHandler):
def on_chain_finish(self, outputs, **kwargs): # wrong name
print(outputs)
# FIXED
from langchain_core.callbacks import BaseCallbackHandler
class MyHandler(BaseCallbackHandler):
def on_chain_end(self, outputs, **kwargs): # correct hook
print(outputs)
4) You are swallowing exceptions in development tooling
Some dev wrappers, notebook cells, or background task runners hide callback failures. The chain completes, but your handler crashed silently.
class DebugHandler(AsyncCallbackHandler):
async def on_llm_new_token(self, token: str, **kwargs):
# This can fail if token is None or if you're assuming kwargs keys exist
print(token.upper())
If this raises inside an async task and you don’t inspect logs carefully, it can look like “callback not firing”.
How to Debug It
- •
Check whether you are using sync or async execution
- •If your handler subclasses
AsyncCallbackHandler, use.ainvoke(),.astream(), or.abatch(). - •If you want
.invoke(), use a synchronous handler derived fromBaseCallbackHandler.
- •If your handler subclasses
- •
Verify which hook should fire
- •For final outputs:
on_chain_end - •For LLM tokens:
on_llm_new_token - •For retriever events: retriever-specific hooks
If you expect token events but did not enable streaming, stop there.
- •For final outputs:
- •
Print at multiple levels Add temporary logging in both chain-level and model-level handlers.
class DebugHandler(BaseCallbackHandler):
def on_chain_start(self, serialized, inputs, **kwargs):
print("CHAIN START")
def on_llm_start(self, serialized, prompts, **kwargs):
print("LLM START")
def on_llm_end(self, response, **kwargs):
print("LLM END")
- •Run a minimal repro Strip out tools, memory, retrievers, and wrappers. Use one prompt plus one chat model. If it works there but not in your app codebase, the bug is in how config/callbacks are being passed through layers.
Prevention
- •
Match execution style to handler type:
- •
BaseCallbackHandlerfor sync paths - •
AsyncCallbackHandlerfor async paths
- •
- •
Enable streaming explicitly when you need token-level events.
- •No streaming means no reliable
on_llm_new_tokendebugging signal.
- •No streaming means no reliable
- •
Keep callback wiring close to the runnable that emits events.
- •Don’t assume chain-level config always reaches nested models/tools.
If you’re still stuck after checking those four areas, the issue is usually not LangChain itself. It’s a mismatch between where events are emitted and where your callback was attached.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit