How to Fix 'callback not firing in production' in LangChain (Python)
If your LangChain callback works locally but disappears in production, the problem is usually not LangChain itself. It means your callback handler is either not attached to the actual runnable that executes, or it’s attached in a way that gets lost once the chain crosses async boundaries, threads, or framework wrappers.
This shows up a lot with AsyncCallbackHandler, StreamingStdOutCallbackHandler, custom tracing handlers, and FastAPI/Celery deployments. The common symptom is: no errors, no callback output, and your on_llm_start, on_llm_new_token, or on_chain_end methods never fire.
The Most Common Cause
The #1 cause is attaching callbacks at the wrong level.
In LangChain Python, callbacks must be passed to the runnable that actually executes. If you attach them to a wrapper object, a helper function, or an outer chain that never directly owns the model call, they can be ignored.
Broken vs fixed
| Broken pattern | Fixed pattern |
|---|---|
| Callback attached to an outer chain, but the inner LLM call never receives it | Callback passed directly into invoke(), ainvoke(), or constructor where supported |
# BROKEN
from langchain_openai import ChatOpenAI
from langchain.callbacks.base import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
class MyHandler(BaseCallbackHandler):
def on_llm_start(self, serialized, prompts, **kwargs):
print("LLM started")
llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template("Write a short summary of {topic}")
chain = prompt | llm | StrOutputParser()
handler = MyHandler()
# This often looks correct but can fail depending on how the chain is wrapped.
result = chain.invoke(
{"topic": "payments"},
config={"callbacks": [handler]},
)
# FIXED
from langchain_openai import ChatOpenAI
from langchain.callbacks.base import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
class MyHandler(BaseCallbackHandler):
def on_llm_start(self, serialized, prompts, **kwargs):
print("LLM started")
llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template("Write a short summary of {topic}")
chain = prompt | llm | StrOutputParser()
handler = MyHandler()
# Pass callbacks at invocation time to the runnable that actually executes.
result = chain.invoke(
{"topic": "payments"},
config={"callbacks": [handler]},
)
That looks identical because the real difference is usually not syntax. The issue is where the callback gets lost: inside a wrapper function, a framework adapter, or an async task boundary.
A more realistic broken case is this:
# BROKEN: callback attached to helper, but helper doesn't forward config
def run_summary(topic: str):
return chain.invoke({"topic": topic})
run_summary("payments")
# FIXED: forward config all the way down
def run_summary(topic: str, config=None):
return chain.invoke({"topic": topic}, config=config)
run_summary("payments", config={"callbacks": [handler]})
If you use LangChain’s newer Runnable API, this forwarding matters. Callbacks are part of execution context, not global state.
Other Possible Causes
1. You used async code but implemented sync-only handlers
If you’re calling ainvoke() or streaming in an async app, implement async methods too.
# BROKEN
from langchain.callbacks.base import AsyncCallbackHandler
class MyAsyncHandler(AsyncCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs):
print(token) # sync method won't be awaited properly in async flow
# FIXED
from langchain.callbacks.base import AsyncCallbackHandler
class MyAsyncHandler(AsyncCallbackHandler):
async def on_llm_new_token(self, token: str, **kwargs):
print(token)
2. Your framework creates a new event loop or worker process
This happens in FastAPI background tasks, Celery workers, Gunicorn workers, and serverless runtimes. A handler instance created in one process won’t magically fire in another.
# Example: Celery task must construct its own handler inside the task
@app.task
def generate_report():
handler = MyHandler()
chain.invoke({"topic": "claims"}, config={"callbacks": [handler]})
If you instantiate callbacks at module load time and expect them to survive worker forks, don’t.
3. Streaming is disabled or unsupported by the model call
on_llm_new_token only fires when token streaming is actually enabled.
llm = ChatOpenAI(
model="gpt-4o-mini",
streaming=True,
)
Without streaming=True, your StreamingStdOutCallbackHandler may never emit anything even though the request succeeds.
4. You’re using deprecated callback plumbing
Older examples use callback_manager= or attach handlers in ways that don’t match current LangChain versions. In newer codebases this often results in silent no-ops.
# Avoid relying on old patterns like this:
llm = ChatOpenAI(callback_manager=manager)
Prefer passing callbacks through runnable config:
result = chain.invoke(
input_data,
config={"callbacks": [handler]},
)
How to Debug It
- •
Verify the handler method matches the event you expect
- •If you want token-level output, implement
on_llm_new_token. - •If you want request start/end visibility, implement
on_llm_startandon_chain_end.
- •If you want token-level output, implement
- •
Add logging inside every callback method
- •Don’t assume one method will fire.
- •Print both args and kwargs so you can see whether LangChain is calling your handler at all.
class DebugHandler(BaseCallbackHandler):
def on_chain_start(self, serialized, inputs, **kwargs):
print("CHAIN START", serialized.get("name"), inputs)
def on_llm_start(self, serialized, prompts, **kwargs):
print("LLM START", serialized.get("name"), prompts)
def on_chain_end(self, outputs, **kwargs):
print("CHAIN END", outputs)
- •
Test with a minimal direct invoke
- •Remove FastAPI/Celery/Streamlit wrappers.
- •Call
chain.invoke()in a plain script. - •If it works there but fails in production, the bug is in your integration layer.
- •
Check whether callbacks are being forwarded
- •If you have helper functions like
run_chain(), inspect whether they accept and passconfig. - •In nested chains/tools/retrievers, confirm each layer forwards execution context.
- •If you have helper functions like
Prevention
- •Pass callbacks through
config={"callbacks": [...]}at invocation time unless you have a strong reason not to. - •Use async handlers for async execution paths; don’t mix sync handlers with
ainvoke()and streaming. - •Add one debug handler to every new project before shipping it to production.
- •Keep LangChain versions pinned and read migration notes before upgrading callback-related code.
The pattern here is consistent: if the callback doesn’t fire in production but works locally, something between your entrypoint and the actual LLM call stripped it away. Trace that path first.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit