How to Fix 'callback not firing when scaling' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-21
callback-not-firing-when-scalinglangchainpython

When callback not firing when scaling shows up in a LangChain Python app, it usually means your callback handler works in a single local run, but stops being invoked once you add concurrency, batching, multiprocessing, or async execution. In practice, the chain is still running, but your handler is attached in the wrong place or on the wrong execution path.

This is almost always a callback propagation issue, not a LangChain bug.

The Most Common Cause

The #1 cause is attaching callbacks to the chain object and expecting them to automatically survive batch(), abatch(), invoke() from worker threads, or nested runnables. In LangChain, callbacks need to be passed through the runtime config or bound at the runnable level in the right place.

Here’s the broken pattern:

from langchain_openai import ChatOpenAI
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

class MyHandler(BaseCallbackHandler):
    def on_llm_start(self, serialized, prompts, **kwargs):
        print("LLM started")

    def on_llm_end(self, response, **kwargs):
        print("LLM ended")

llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template("Summarize: {text}")

chain = prompt | llm | StrOutputParser()

# WRONG: handler attached here is easy to lose when scaling / nesting
chain.callbacks = [MyHandler()]

# This may work locally but fail under batch/async/multi-worker usage
result = chain.invoke({"text": "LangChain callbacks"})
print(result)

And here’s the fixed pattern:

from langchain_openai import ChatOpenAI
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

class MyHandler(BaseCallbackHandler):
    def on_llm_start(self, serialized, prompts, **kwargs):
        print("LLM started")

    def on_llm_end(self, response, **kwargs):
        print("LLM ended")

llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template("Summarize: {text}")
chain = prompt | llm | StrOutputParser()

# RIGHT: pass callbacks through config at invocation time
result = chain.invoke(
    {"text": "LangChain callbacks"},
    config={"callbacks": [MyHandler()]},
)
print(result)

If you’re using async or batch execution, keep the same rule:

# Good for parallel runs too
results = chain.batch(
    [{"text": "a"}, {"text": "b"}],
    config={"callbacks": [MyHandler()]},
)

The key point: don’t rely on mutating chain.callbacks after construction. Use config={"callbacks": [...]} or bind callbacks where LangChain expects them.

Other Possible Causes

1. You’re using batch() / abatch() and your handler isn’t thread-safe

When scaling out with thread pools or async tasks, a shared handler can race itself or drop state.

class MyHandler(BaseCallbackHandler):
    def __init__(self):
        self.count = 0

    def on_chain_start(self, serialized, inputs, **kwargs):
        self.count += 1  # unsafe if shared across concurrent runs

Fix it by avoiding mutable shared state or by storing per-run data keyed by run_id.

2. You attached callbacks to the wrong object

A common mistake is attaching handlers to the LLM but expecting chain-level events like on_chain_start().

BrokenFixed
llm = llm.with_config({"callbacks": [MyHandler()]}) and expecting on_chain_start()Attach to the chain invocation: chain.invoke(..., config={"callbacks": [MyHandler()]})
Handler only sees model eventsHandler sees chain + model events when passed through config

Example:

# Only LLM events fire here
llm = ChatOpenAI(model="gpt-4o-mini").with_config(
    {"callbacks": [MyHandler()]}
)

If you want full trace visibility across prompt → model → parser, attach at the top-level runnable.

3. Your code switched from sync to async and you’re calling the wrong method

A sync handler won’t fire if your app is using ainvoke() incorrectly or blocking inside an event loop.

# WRONG: mixing sync invoke inside async code paths
async def run():
    result = chain.invoke({"text": "hello"})  # blocks event loop

# RIGHT:
async def run():
    result = await chain.ainvoke(
        {"text": "hello"},
        config={"callbacks": [MyHandler()]},
    )

If you’re in FastAPI or any async server, this matters immediately.

4. The callback class doesn’t implement the event you expect

LangChain won’t call methods you didn’t define. If you’re watching for token streaming but only implemented start/end hooks, nothing “looks” broken.

class MyHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs):
        print(token)

For OpenAI streaming models:

llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)

Without streaming=True, token-level callbacks like on_llm_new_token() will never fire.

How to Debug It

  1. Confirm which event is missing

    • Is it on_chain_start, on_llm_start, on_llm_new_token, or on_chain_end?
    • Different missing events point to different layers of the stack.
  2. Add a minimal handler that prints every hook

    • Implement:
      • on_chain_start
      • on_llm_start
      • on_llm_new_token
      • on_llm_end
      • on_chain_end
    • This tells you whether callbacks are missing entirely or just one phase.
  3. Test with a single synchronous .invoke() call

    • Remove batching, threading, Celery workers, and async wrappers.
    • If it works locally but fails under load, you have a propagation/concurrency issue.
  4. Inspect how callbacks are passed

    • Prefer:
      chain.invoke(input_data, config={"callbacks": [handler]})
      
    • Avoid:
      chain.callbacks = [handler]
      
    • If using nested runnables, make sure each layer receives context via runtime config.

Prevention

  • Pass handlers through config={"callbacks": [...]} at invocation time.
  • Keep callback handlers stateless or per-run safe when using parallel execution.
  • Enable streaming explicitly if you depend on token-level hooks like on_llm_new_token().

If you build LangChain apps that scale beyond one request at a time, treat callbacks as runtime context — not object state. That one change prevents most “callback not firing when scaling” incidents.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides