How to Fix 'callback not firing during development' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-21
callback-not-firing-during-developmentlangchainpython

If your LangChain callback works in production but not during local development, the usual problem is not the callback handler itself. It’s almost always that the async/sync execution path does not match the callback API you implemented, so LangChain never reaches your handler.

This shows up a lot with AsyncCallbackHandler, Runnable chains, and streaming setups where you expect on_llm_new_token() or on_chain_end() to fire, but nothing prints and no exception is raised.

The Most Common Cause

The #1 cause is mixing synchronous invocation with async callbacks, or registering an async handler but calling the chain with a sync method like .invoke().

LangChain will happily run the chain, but your async callback methods such as on_llm_new_token or on_chain_end won’t fire the way you expect unless you use the async execution path.

Broken vs fixed

Broken patternFixed pattern
Uses AsyncCallbackHandler but calls .invoke()Uses AsyncCallbackHandler with .ainvoke()
Expects token streaming without enabling streamingEnables streaming on the model and uses async run path
# BROKEN
import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.callbacks import AsyncCallbackHandler
from langchain_core.prompts import ChatPromptTemplate

class DebugHandler(AsyncCallbackHandler):
    async def on_llm_new_token(self, token: str, **kwargs):
        print(f"TOKEN: {token}")

prompt = ChatPromptTemplate.from_template("Write a short haiku about {topic}")
llm = ChatOpenAI(model="gpt-4o-mini")  # streaming not enabled

chain = prompt | llm

# This will NOT trigger async token callbacks as expected
result = chain.invoke({"topic": "rain"})
print(result)
# FIXED
import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.callbacks import AsyncCallbackHandler
from langchain_core.prompts import ChatPromptTemplate

class DebugHandler(AsyncCallbackHandler):
    async def on_llm_new_token(self, token: str, **kwargs):
        print(f"TOKEN: {token}")

prompt = ChatPromptTemplate.from_template("Write a short haiku about {topic}")
llm = ChatOpenAI(
    model="gpt-4o-mini",
    streaming=True,
    callbacks=[DebugHandler()],
)

chain = prompt | llm

async def main():
    result = await chain.ainvoke({"topic": "rain"})
    print(result)

asyncio.run(main())

If you’re using ConversationChain, LLMChain, or any older chain class, the same rule applies: sync entrypoints call sync callbacks; async handlers need async entrypoints.

Other Possible Causes

1) You attached callbacks in the wrong place

In LangChain v0.2+, callbacks can be attached at different levels. If you attach them to the chain but the model is actually doing the work elsewhere, your handler may never see events.

# BROKEN: callbacks on chain only, model created elsewhere
llm = ChatOpenAI(model="gpt-4o-mini")
chain = prompt | llm.with_config({"callbacks": [DebugHandler()]})
# FIXED: attach callbacks directly to the runnable/model that emits events
llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[DebugHandler()])
chain = prompt | llm

2) Streaming is off, so token callbacks never fire

If you’re waiting for on_llm_new_token, you need a streaming-capable model configuration. Without streaming, LangChain returns only the final message.

# BROKEN
llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[DebugHandler()])
# FIXED
llm = ChatOpenAI(
    model="gpt-4o-mini",
    streaming=True,
    callbacks=[DebugHandler()],
)

3) Your callback method name or signature is wrong

LangChain won’t call a method just because it looks close. If you implement on_chain_finish instead of on_chain_end, nothing happens.

# BROKEN
from langchain_core.callbacks import BaseCallbackHandler

class MyHandler(BaseCallbackHandler):
    def on_chain_finish(self, outputs, **kwargs):  # wrong name
        print(outputs)
# FIXED
from langchain_core.callbacks import BaseCallbackHandler

class MyHandler(BaseCallbackHandler):
    def on_chain_end(self, outputs, **kwargs):  # correct hook
        print(outputs)

4) You are swallowing exceptions in development tooling

Some dev wrappers, notebook cells, or background task runners hide callback failures. The chain completes, but your handler crashed silently.

class DebugHandler(AsyncCallbackHandler):
    async def on_llm_new_token(self, token: str, **kwargs):
        # This can fail if token is None or if you're assuming kwargs keys exist
        print(token.upper())

If this raises inside an async task and you don’t inspect logs carefully, it can look like “callback not firing”.

How to Debug It

  1. Check whether you are using sync or async execution

    • If your handler subclasses AsyncCallbackHandler, use .ainvoke(), .astream(), or .abatch().
    • If you want .invoke(), use a synchronous handler derived from BaseCallbackHandler.
  2. Verify which hook should fire

    • For final outputs: on_chain_end
    • For LLM tokens: on_llm_new_token
    • For retriever events: retriever-specific hooks
      If you expect token events but did not enable streaming, stop there.
  3. Print at multiple levels Add temporary logging in both chain-level and model-level handlers.

class DebugHandler(BaseCallbackHandler):
    def on_chain_start(self, serialized, inputs, **kwargs):
        print("CHAIN START")

    def on_llm_start(self, serialized, prompts, **kwargs):
        print("LLM START")

    def on_llm_end(self, response, **kwargs):
        print("LLM END")
  1. Run a minimal repro Strip out tools, memory, retrievers, and wrappers. Use one prompt plus one chat model. If it works there but not in your app codebase, the bug is in how config/callbacks are being passed through layers.

Prevention

  • Match execution style to handler type:

    • BaseCallbackHandler for sync paths
    • AsyncCallbackHandler for async paths
  • Enable streaming explicitly when you need token-level events.

    • No streaming means no reliable on_llm_new_token debugging signal.
  • Keep callback wiring close to the runnable that emits events.

    • Don’t assume chain-level config always reaches nested models/tools.

If you’re still stuck after checking those four areas, the issue is usually not LangChain itself. It’s a mismatch between where events are emitted and where your callback was attached.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides