How to Fix 'callback not firing when scaling' in AutoGen (Python)

By Cyprian AaronsUpdated 2026-04-21

callback-not-firing-when-scalingautogenpython

What this error usually means

If you’re seeing callback not firing when scaling in AutoGen, the agent is running, but your callback is not being invoked consistently once you move from a single local run to multiple workers, async execution, or a distributed setup. In practice, this usually shows up when you register a callback on one object, then scale with another process, another agent instance, or a code path that bypasses the hook entirely.

The symptom is simple: the conversation continues, but your logging, tool handler, approval hook, or post-processing callback never runs.

The Most Common Cause

The #1 cause is registering the callback on the wrong object or in the wrong process scope.

With AutoGen, especially when using AssistantAgent, UserProxyAgent, or custom tool handlers, callbacks are often attached to an instance that only exists in the parent process. When you scale with multiprocessing, worker pools, Celery, Ray, or multiple Uvicorn/Gunicorn workers, that in-memory registration does not travel with the task.

Broken pattern vs fixed pattern

Broken	Fixed
Callback registered once in parent process	Callback registered inside worker/task setup
Relies on global mutable state	Uses explicit initialization per worker
Works locally with one process	Breaks under scaling

# BROKEN: callback registered in the main process only
from autogen import AssistantAgent

events = []

def on_reply(message):
    events.append(message)
    print("callback fired:", message)

assistant = AssistantAgent(
    name="assistant",
    llm_config={"config_list": [{"model": "gpt-4o-mini"}]},
)

# This looks fine locally...
assistant.register_reply([AssistantAgent], on_reply)

def handle_request():
    # ...but if this runs in another worker/process,
    # the registered callback may not exist there.
    return assistant.generate_reply(messages=[{"role": "user", "content": "hello"}])

# FIXED: register inside the worker/process that actually executes the agent
from autogen import AssistantAgent

def build_assistant():
    assistant = AssistantAgent(
        name="assistant",
        llm_config={"config_list": [{"model": "gpt-4o-mini"}]},
    )

    def on_reply(message):
        print("callback fired:", message)

    assistant.register_reply([AssistantAgent], on_reply)
    return assistant

def handle_request():
    assistant = build_assistant()
    return assistant.generate_reply(messages=[{"role": "user", "content": "hello"}])

If you’re using a server framework like FastAPI with multiple workers, this matters even more. Each worker is its own process; nothing you attach at import time is guaranteed to exist everywhere.

Other Possible Causes

1) Async callback signature mismatch

AutoGen will silently skip or fail to call hooks if your callback signature does not match what the framework expects.

# BROKEN
async def my_callback(sender):  # wrong params for this hook
    print(sender)

# FIXED
async def my_callback(recipient, messages=None, sender=None, config=None):
    print(recipient.name)

If you see errors like:

•TypeError: my_callback() takes 1 positional argument but 4 were given
•TypeError: object NoneType can't be used in 'await' expression

then your hook contract is wrong.

2) Using sync code inside async execution without awaiting

When scaling async agents, people often call async methods like sync ones. The result is a coroutine that never runs.

# BROKEN
result = assistant.a_generate_reply(messages=messages)
print(result)  # coroutine object, not executed

# FIXED
result = await assistant.a_generate_reply(messages=messages)
print(result)

If your callback depends on that execution path completing, it will look like “the callback didn’t fire” when really the coroutine never ran.

3) Registering on the wrong agent class

Some AutoGen setups involve AssistantAgent, UserProxyAgent, and custom subclasses. If you register a reply hook on one class but execute another instance path, nothing happens.

# BROKEN
assistant.register_reply([UserProxyAgent], on_reply)  # won't match assistant flow

# FIXED
assistant.register_reply([AssistantAgent], on_reply)

This gets missed when refactoring from a demo into production code where agents are created dynamically.

4) Tool/function routing bypasses your callback

If you use function calling or tools, your callback may never fire because the model route goes directly to tool execution.

llm_config = {
    "config_list": [{"model": "gpt-4o-mini"}],
    "tools": [my_tool],
}

If your logic lives in register_reply() but the actual path is tool dispatch via register_function(), your hook won’t be hit. Put instrumentation where execution actually happens:

def my_tool(query: str):
    print("tool fired:", query)
    return {"result": "ok"}

How to Debug It

•
Confirm the code path actually runs
- •Add a plain print() before and after agent invocation.
- •If the second line never prints, this is not a callback problem yet.
•
Check whether you are crossing process boundaries
- •Look for Gunicorn/Uvicorn workers, Celery tasks, Ray actors, multiprocessing pools.
- •Any in-memory registration done before fork/spawn can disappear in workers.
•
Verify hook registration at runtime
- •Log immediately after registration.
- •If possible, inspect the agent object right before invocation and confirm the handler list contains your function.
•
Reduce to one agent and one worker
- •Run with a single process and no queue.
- •If it works there and fails under scale-out, you have an initialization/scope issue rather than an AutoGen bug.

A useful pattern is to add explicit tracing:

def on_reply(message):
    print(f"[TRACE] callback fired: {message}")

print("[TRACE] registering callback")
assistant.register_reply([AssistantAgent], on_reply)
print("[TRACE] invoking agent")

If registration prints but firing doesn’t under load, you’re dealing with lifecycle or routing issues.

Prevention

•
Register callbacks inside the same lifecycle scope as execution:
- •inside worker startup hooks
- •inside request handlers only if needed per request
•
Avoid global mutable agent instances when using:
- •multiprocessing
- •distributed queues
- •web servers with multiple workers
•
Treat every AutoGen hook as part of an execution contract:
- •verify class match
- •verify sync vs async signature
- •verify whether tool routing bypasses reply hooks

If you build AutoGen agents for production systems like claims triage or banking workflows, assume scaling will expose every hidden dependency on local state. Keep registration explicit, local to the worker that executes it, and trace every hop until you know exactly where the callback stops firing.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit