How to Fix 'callback not firing' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-21

callback-not-firinglangchainpython

What “callback not firing” usually means

In LangChain, this usually means your callback handler is registered, but the event you expect never reaches it. Most of the time, the chain runs fine and your handler stays silent because of async/sync mismatch, wrong callback attachment, or a runnable that doesn’t emit the event you’re listening for.

You’ll typically see this when using CallbackManager, BaseCallbackHandler, AsyncCallbackHandler, ChatOpenAI, RunnableSequence, or LLMChain and expecting on_llm_start, on_llm_new_token, or on_chain_end to fire.

The Most Common Cause

The #1 cause is mixing sync and async callbacks incorrectly.

If you pass an AsyncCallbackHandler into a sync .invoke() path, or define async def on_llm_new_token(...) but call a synchronous model method, LangChain won’t await your handler. The result is exactly what people describe as “callback not firing.”

Broken vs fixed pattern

Broken	Fixed
Async handler used with sync invoke	Async handler used with async ainvoke
Callback passed in the wrong place	Callback passed via `config={"callbacks": [...]}` or constructor
Expecting token events from a non-streaming call	Enabling streaming and using the right method

# BROKEN: async callback + sync invoke
import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.callbacks import AsyncCallbackHandler

class MyHandler(AsyncCallbackHandler):
    async def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"TOKEN: {token}")

llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)

# This often results in no callback output because invoke() is synchronous.
result = llm.invoke("Write one sentence.", config={"callbacks": [MyHandler()]})
print(result)

# FIXED: async callback + async ainvoke
import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.callbacks import AsyncCallbackHandler

class MyHandler(AsyncCallbackHandler):
    async def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"TOKEN: {token}")

async def main():
    llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)
    result = await llm.ainvoke(
        "Write one sentence.",
        config={"callbacks": [MyHandler()]},
    )
    print(result)

asyncio.run(main())

If you want sync code, use BaseCallbackHandler instead:

from langchain_openai import ChatOpenAI
from langchain_core.callbacks import BaseCallbackHandler

class MySyncHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"TOKEN: {token}")

llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)
result = llm.invoke("Write one sentence.", config={"callbacks": [MySyncHandler()]})

Other Possible Causes

1) You attached callbacks to the wrong object

In LangChain 0.2+, many objects accept callbacks at different levels. If you attach them to a wrapper chain but the actual LLM call happens deeper down without propagation, your handler may never see events.

# Wrong: callback attached to a different layer than the actual runnable execution path
chain = prompt | llm
result = chain.invoke(
    {"topic": "payments"},
    config={"callbacks": [MyHandler()]},
)

# Better: ensure the runnable path actually propagates config/callbacks
result = chain.invoke(
    {"topic": "payments"},
    config={"callbacks": [MyHandler()], "tags": ["debug"]},
)

2) Streaming is off, so token callbacks never fire

on_llm_new_token only fires when tokens are streamed. If you call a normal completion method without streaming enabled, there are no intermediate tokens to report.

llm = ChatOpenAI(model="gpt-4o-mini", streaming=False)  # no token stream

Fix:

llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)

If you only need final output, use on_llm_end or on_chain_end instead of token-level hooks.

3) You implemented the wrong callback method name

LangChain will not call arbitrary methods. If you define on_token() instead of on_llm_new_token(), nothing happens.

# Wrong
class MyHandler(BaseCallbackHandler):
    def on_token(self, token: str, **kwargs):
        print(token)

# Right
class MyHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs):
        print(token)

Common methods that actually fire:

•on_chain_start
•on_chain_end
•on_llm_start
•on_llm_new_token
•on_tool_start
•on_tool_end
•on_retriever_start
•on_retriever_end

4) Your custom handler is being garbage-collected or never referenced

This shows up in notebooks and quick scripts where people instantiate inline and assume it stays alive. Keep a strong reference while debugging.

handler = MyHandler()
result = llm.invoke("hello", config={"callbacks": [handler]})

Also watch for environment-specific issues like Jupyter event loops or background threads swallowing exceptions inside handlers. If your handler throws inside on_llm_new_token, LangChain may log something like:

Error in CallbackManager.on_llm_new_token callback

That can look like “not firing” when it’s actually failing silently downstream.

How to Debug It

•
Confirm the exact execution path
- •Are you calling .invoke() or .ainvoke()?
- •Are you using a streaming model?
- •Are you calling an LLM directly or through a chain/tool/retriever?
•
Add all three core hooks
- •Implement on_chain_start, on_llm_start, and on_llm_new_token.
- •If only some fire, you’ve narrowed down where propagation stops.

from langchain_core.callbacks import BaseCallbackHandler

class DebugHandler(BaseCallbackHandler):
    def on_chain_start(self, serialized, inputs, **kwargs):
        print("CHAIN START")

    def on_llm_start(self, serialized, prompts, **kwargs):
        print("LLM START")

    def on_llm_new_token(self, token: str, **kwargs):
        print("TOKEN:", token)

•
Remove wrappers
- •Test the model directly before testing chains.
- •Then add prompt templates.
- •Then add tools/retrievers.
- •The layer that breaks propagation is usually where the bug lives.
•
Check logs for callback exceptions
- •
  Search for:
  - •Error in CallbackManager
  - •Exception in callback
  - •async warnings like:
    
    •RuntimeWarning: coroutine was never awaited
    
    •There is no current event loop in thread

Prevention

•Use sync handlers with sync calls and async handlers with async calls.
•Turn on streaming only when you actually need token-level events.
•Keep handlers simple and fail-fast; don’t do network I/O inside callbacks unless you queue it off-thread.
•In production code, test both direct model invocation and full chain invocation so callback propagation gets covered before deployment.

If your callback isn’t firing in LangChain Python, start by checking whether you’re mixing .invoke() with an async handler. That’s the most common failure mode by far.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit