How to Fix 'callback not firing in production' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-21
callback-not-firing-in-productionlangchainpython

If your LangChain callback works locally but disappears in production, the problem is usually not LangChain itself. It means your callback handler is either not attached to the actual runnable that executes, or it’s attached in a way that gets lost once the chain crosses async boundaries, threads, or framework wrappers.

This shows up a lot with AsyncCallbackHandler, StreamingStdOutCallbackHandler, custom tracing handlers, and FastAPI/Celery deployments. The common symptom is: no errors, no callback output, and your on_llm_start, on_llm_new_token, or on_chain_end methods never fire.

The Most Common Cause

The #1 cause is attaching callbacks at the wrong level.

In LangChain Python, callbacks must be passed to the runnable that actually executes. If you attach them to a wrapper object, a helper function, or an outer chain that never directly owns the model call, they can be ignored.

Broken vs fixed

Broken patternFixed pattern
Callback attached to an outer chain, but the inner LLM call never receives itCallback passed directly into invoke(), ainvoke(), or constructor where supported
# BROKEN
from langchain_openai import ChatOpenAI
from langchain.callbacks.base import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

class MyHandler(BaseCallbackHandler):
    def on_llm_start(self, serialized, prompts, **kwargs):
        print("LLM started")

llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template("Write a short summary of {topic}")
chain = prompt | llm | StrOutputParser()

handler = MyHandler()

# This often looks correct but can fail depending on how the chain is wrapped.
result = chain.invoke(
    {"topic": "payments"},
    config={"callbacks": [handler]},
)
# FIXED
from langchain_openai import ChatOpenAI
from langchain.callbacks.base import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

class MyHandler(BaseCallbackHandler):
    def on_llm_start(self, serialized, prompts, **kwargs):
        print("LLM started")

llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template("Write a short summary of {topic}")
chain = prompt | llm | StrOutputParser()

handler = MyHandler()

# Pass callbacks at invocation time to the runnable that actually executes.
result = chain.invoke(
    {"topic": "payments"},
    config={"callbacks": [handler]},
)

That looks identical because the real difference is usually not syntax. The issue is where the callback gets lost: inside a wrapper function, a framework adapter, or an async task boundary.

A more realistic broken case is this:

# BROKEN: callback attached to helper, but helper doesn't forward config
def run_summary(topic: str):
    return chain.invoke({"topic": topic})

run_summary("payments")
# FIXED: forward config all the way down
def run_summary(topic: str, config=None):
    return chain.invoke({"topic": topic}, config=config)

run_summary("payments", config={"callbacks": [handler]})

If you use LangChain’s newer Runnable API, this forwarding matters. Callbacks are part of execution context, not global state.

Other Possible Causes

1. You used async code but implemented sync-only handlers

If you’re calling ainvoke() or streaming in an async app, implement async methods too.

# BROKEN
from langchain.callbacks.base import AsyncCallbackHandler

class MyAsyncHandler(AsyncCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs):
        print(token)  # sync method won't be awaited properly in async flow
# FIXED
from langchain.callbacks.base import AsyncCallbackHandler

class MyAsyncHandler(AsyncCallbackHandler):
    async def on_llm_new_token(self, token: str, **kwargs):
        print(token)

2. Your framework creates a new event loop or worker process

This happens in FastAPI background tasks, Celery workers, Gunicorn workers, and serverless runtimes. A handler instance created in one process won’t magically fire in another.

# Example: Celery task must construct its own handler inside the task
@app.task
def generate_report():
    handler = MyHandler()
    chain.invoke({"topic": "claims"}, config={"callbacks": [handler]})

If you instantiate callbacks at module load time and expect them to survive worker forks, don’t.

3. Streaming is disabled or unsupported by the model call

on_llm_new_token only fires when token streaming is actually enabled.

llm = ChatOpenAI(
    model="gpt-4o-mini",
    streaming=True,
)

Without streaming=True, your StreamingStdOutCallbackHandler may never emit anything even though the request succeeds.

4. You’re using deprecated callback plumbing

Older examples use callback_manager= or attach handlers in ways that don’t match current LangChain versions. In newer codebases this often results in silent no-ops.

# Avoid relying on old patterns like this:
llm = ChatOpenAI(callback_manager=manager)

Prefer passing callbacks through runnable config:

result = chain.invoke(
    input_data,
    config={"callbacks": [handler]},
)

How to Debug It

  1. Verify the handler method matches the event you expect

    • If you want token-level output, implement on_llm_new_token.
    • If you want request start/end visibility, implement on_llm_start and on_chain_end.
  2. Add logging inside every callback method

    • Don’t assume one method will fire.
    • Print both args and kwargs so you can see whether LangChain is calling your handler at all.
class DebugHandler(BaseCallbackHandler):
    def on_chain_start(self, serialized, inputs, **kwargs):
        print("CHAIN START", serialized.get("name"), inputs)

    def on_llm_start(self, serialized, prompts, **kwargs):
        print("LLM START", serialized.get("name"), prompts)

    def on_chain_end(self, outputs, **kwargs):
        print("CHAIN END", outputs)
  1. Test with a minimal direct invoke

    • Remove FastAPI/Celery/Streamlit wrappers.
    • Call chain.invoke() in a plain script.
    • If it works there but fails in production, the bug is in your integration layer.
  2. Check whether callbacks are being forwarded

    • If you have helper functions like run_chain(), inspect whether they accept and pass config.
    • In nested chains/tools/retrievers, confirm each layer forwards execution context.

Prevention

  • Pass callbacks through config={"callbacks": [...]} at invocation time unless you have a strong reason not to.
  • Use async handlers for async execution paths; don’t mix sync handlers with ainvoke() and streaming.
  • Add one debug handler to every new project before shipping it to production.
  • Keep LangChain versions pinned and read migration notes before upgrading callback-related code.

The pattern here is consistent: if the callback doesn’t fire in production but works locally, something between your entrypoint and the actual LLM call stripped it away. Trace that path first.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides