How to Fix 'callback not firing in production' in LangGraph (Python)

By Cyprian AaronsUpdated 2026-04-21
callback-not-firing-in-productionlanggraphpython

What this error usually means

If your LangGraph callback works locally but never fires in production, the graph is usually executing, but the callback is attached to the wrong object, wrong lifecycle, or wrong runtime. In practice, this shows up when you move from a notebook or local invoke() test to an API server, worker, or async runtime.

The most common symptom is: no exception, no callback output, and your tracing/logging hook stays silent even though the graph returns a result.

The Most Common Cause

The #1 cause is attaching callbacks to the wrong layer of LangChain/LangGraph execution.

In production, people often pass callbacks to the graph node function, or forget that CompiledStateGraph.invoke() needs config-level callbacks. The graph runs, but your handler never receives on_chain_start, on_chain_end, or on_tool_end.

Broken vs fixed pattern

Broken patternFixed pattern
Callback passed into the task function directlyCallback passed through config={"callbacks": [...]}
Works in ad hoc tests onlyWorks consistently in invoke() / ainvoke()
No BaseCallbackHandler events firedEvents fire as expected
from langchain.callbacks.base import BaseCallbackHandler
from langgraph.graph import StateGraph, END
from typing import TypedDict

class MyHandler(BaseCallbackHandler):
    def on_chain_start(self, serialized, inputs, **kwargs):
        print("chain started")

    def on_chain_end(self, outputs, **kwargs):
        print("chain ended")

class State(TypedDict):
    text: str

def node(state: State):
    return {"text": state["text"].upper()}

graph = StateGraph(State)
graph.add_node("upper", node)
graph.set_entry_point("upper")
graph.add_edge("upper", END)
app = graph.compile()

# BROKEN: passing handler nowhere useful
result = app.invoke({"text": "hello"})
from langchain.callbacks.base import BaseCallbackHandler
from langgraph.graph import StateGraph, END
from typing import TypedDict

class MyHandler(BaseCallbackHandler):
    def on_chain_start(self, serialized, inputs, **kwargs):
        print("chain started")

    def on_chain_end(self, outputs, **kwargs):
        print("chain ended")

class State(TypedDict):
    text: str

def node(state: State):
    return {"text": state["text"].upper()}

graph = StateGraph(State)
graph.add_node("upper", node)
graph.set_entry_point("upper")
graph.add_edge("upper", END)
app = graph.compile()

# FIXED: callbacks passed in config
result = app.invoke(
    {"text": "hello"},
    config={"callbacks": [MyHandler()]},
)

If you are using LangSmith or OpenTelemetry wrappers, the same rule applies: pass them through runtime config or your app’s execution context. Don’t assume a callback attached at object construction will survive compilation and deployment.

Other Possible Causes

1) You are using async code but calling sync entrypoints

If your nodes are async and your server is async too, calling .invoke() can create confusing behavior. Use .ainvoke() and await it.

# Wrong
result = app.invoke({"text": "hello"})

# Right
result = await app.ainvoke({"text": "hello"}, config={"callbacks": [MyHandler()]})

This matters especially in FastAPI endpoints and background workers.

2) Your callback class is not a real LangChain handler

A plain Python class with methods named on_chain_start is not enough if it does not inherit from BaseCallbackHandler. LangChain dispatch looks for the handler interface.

# Wrong
class MyHandler:
    def on_chain_start(self, serialized, inputs, **kwargs):
        print("start")

# Right
from langchain.callbacks.base import BaseCallbackHandler

class MyHandler(BaseCallbackHandler):
    def on_chain_start(self, serialized, inputs, **kwargs):
        print("start")

If you want tool-level events too, implement the specific hooks you need:

  • on_chain_start
  • on_chain_end
  • on_tool_start
  • on_tool_end
  • on_llm_start
  • on_llm_end

3) You are swallowing exceptions inside the node

A callback may not fire if your code catches exceptions too early and returns fallback output before LangGraph can emit normal lifecycle events. This often happens with broad except Exception: blocks.

def node(state):
    try:
        return {"text": risky_call(state["text"])}
    except Exception:
        return {"text": "fallback"}  # hides real failure path

Fix it by logging and re-raising during diagnosis:

def node(state):
    try:
        return {"text": risky_call(state["text"])}
    except Exception as e:
        print(f"node failed: {e}")
        raise

4) Your production worker strips callback context

Some queues and worker setups serialize only input payloads and drop execution metadata. If you enqueue raw state but not config/context, the worker runs without callbacks.

# Bad: only payload gets queued
job = {"text": "hello"}

# Better: carry execution config too
job = {
    "input": {"text": "hello"},
    "config": {
        "callbacks": [MyHandler()],
        "metadata": {"request_id": "req_123"},
    },
}

This is common with Celery, RQ, SQS consumers, and custom job runners.

How to Debug It

  1. Confirm whether the graph executes at all

    • Add a plain print() inside the first node.
    • If that prints but callbacks do not fire, this is a handler/config issue.
    • If neither prints, your graph entrypoint or worker routing is broken.
  2. Check whether you are using the right entrypoint

    • For sync code use .invoke().
    • For async code use .ainvoke().
    • For streaming use .stream() or .astream() depending on runtime.
    • A mismatch here often explains “works locally” failures.
  3. Verify handler inheritance and config placement

    • Confirm your class extends BaseCallbackHandler.
    • Confirm callbacks are passed in config={"callbacks": [...]}.
    • Confirm you are passing config to the actual call site that executes the compiled graph.
  4. Turn on verbose logging for one run

    • Print out request IDs and state keys before invoking.
    • If using LangSmith or tracing middleware, verify environment variables in production.
    • Check for suppressed exceptions in middleware or task runners.

Prevention

  • Always pass callbacks through runtime config at the final .invoke() / .ainvoke() call site.
  • Use one integration test that runs the compiled graph in the same runtime as production: FastAPI, Celery worker, Lambda runtime, whatever you deploy.
  • Keep a minimal diagnostic handler around that prints on_chain_start and on_chain_end so you can confirm event propagation quickly.

If you want this to stop being a recurring incident class in your team:

  • standardize callback wiring in one helper function,
  • avoid broad exception swallowing inside nodes,
  • and test both sync and async execution paths before shipping.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides