How to Fix 'callback not firing in production' in AutoGen (Python)

By Cyprian AaronsUpdated 2026-04-21
callback-not-firing-in-productionautogenpython

What this error usually means

If you’re seeing “callback not firing in production” with AutoGen in Python, the agent is running, but your event handler, tool hook, or reply callback is never being invoked. In practice, this usually shows up when code works locally in a notebook or REPL, then goes silent once deployed behind FastAPI, Celery, Docker, or a serverless worker.

The failure is rarely “AutoGen is broken.” It’s usually one of three things: the callback was registered on the wrong object, the event loop/task lifecycle changed in production, or the process exits before the async callback gets a chance to run.

The Most Common Cause

The #1 cause I see is registering the callback on an instance that never handles the actual message flow.

With AutoGen, people often create a ConversableAgent, register a reply function, then send messages through a different instance or through a code path that bypasses that agent entirely. Locally it can look fine because everything happens in one file and one event loop.

Broken vs fixed pattern

Broken patternFixed pattern
Callback registered on assistant, but messages sent through another agent or another process pathCallback registered on the exact agent instance that receives the message
Async callback defined but not awaited in production pathAsync callback wired into an async flow and awaited properly
# BROKEN
from autogen import AssistantAgent, UserProxyAgent

assistant = AssistantAgent(
    name="assistant",
    llm_config={"config_list": [{"model": "gpt-4o-mini", "api_key": "..." }]},
)

user = UserProxyAgent(name="user", human_input_mode="NEVER")

def my_callback(recipient, messages=None, sender=None, config=None):
    print("callback fired")
    return False, None

assistant.register_reply([UserProxyAgent], my_callback)

# This looks fine locally, but if production sends via a different agent
# or bypasses assistant.receive(), your callback never runs.
user.initiate_chat(assistant, message="Hello")
# FIXED
from autogen import AssistantAgent, UserProxyAgent

assistant = AssistantAgent(
    name="assistant",
    llm_config={"config_list": [{"model": "gpt-4o-mini", "api_key": "..." }]},
)

user = UserProxyAgent(name="user", human_input_mode="NEVER")

def my_callback(recipient, messages=None, sender=None, config=None):
    print("callback fired")
    return False, None

# Register on the agent that will actually receive the conversation turn
assistant.register_reply([UserProxyAgent], my_callback)

# Ensure this exact assistant instance handles the chat
user.initiate_chat(assistant, message="Hello")

If you’re using GroupChatManager, ConversableAgent, or custom routing in production, confirm the callback is attached to the object that actually processes incoming messages. A very common symptom is: no exception, no stack trace, just missing side effects.

Other Possible Causes

1) Async callback not awaited

If your callback is async def but your production code calls it like a normal function, it won’t execute as expected.

# BROKEN
async def my_callback(recipient, messages=None, sender=None, config=None):
    print("async callback fired")
    return False, None

assistant.register_reply([UserProxyAgent], my_callback)

Fix by making sure your execution path supports async all the way through:

# FIXED
async def my_callback(recipient, messages=None, sender=None, config=None):
    print("async callback fired")
    return False, None

assistant.register_reply([UserProxyAgent], my_callback)
# Use async chat flow where supported by your AutoGen version

2) Process exits before background work completes

This happens a lot in web handlers and serverless jobs. The request returns early and your callback task gets killed with the worker.

# BROKEN
@app.post("/chat")
def chat():
    user.initiate_chat(assistant, message="Run analysis")
    return {"status": "queued"}  # process may exit before callback side effects complete

Use explicit waiting or run the work inside the request lifecycle:

# FIXED
@app.post("/chat")
async def chat():
    await run_autogen_chat()  # keep work inside an awaited async path
    return {"status": "done"}

3) Wrong trigger condition or recipient filter

AutoGen reply hooks often depend on sender type or message content. If your filter is too narrow, it will never match in production traffic.

# BROKEN: only fires for one exact sender type/path
assistant.register_reply(
    [UserProxyAgent],
    my_callback,
)

If production sends from another agent class or wrapper object:

# FIXED: broaden carefully while debugging
assistant.register_reply(
    [UserProxyAgent],  # expand only if you confirm other senders are valid
    my_callback,
)

Also check for content-based guards inside your callback:

def my_callback(recipient, messages=None, sender=None, config=None):
    last = messages[-1]["content"] if messages else ""
    if "approve" not in last.lower():
        return False, None  # silently skips most traffic

4) Version mismatch between local and prod AutoGen

AutoGen has had API changes across versions. A local notebook might be on one package version while prod Docker image uses another.

pip show pyautogen autogen-agentchat autogen-core
pip freeze | grep -i autogen

Pin versions explicitly:

pyautogen==0.2.35

Or whatever exact version your code was validated against. Don’t mix old register_reply usage with newer package layouts without checking release notes.

How to Debug It

  1. Print from inside the callback

    • Add a hard print() or structured log at the top of the function.
    • If it doesn’t fire once in prod logs, the issue is registration or routing.
  2. Verify the exact agent instance

    • Log id(assistant) where you register and where you send.
    • If those IDs differ across files/processes/requests, you’re not using the same object.
  3. Check whether your code path is async

    • Search for async def, await, background tasks, and thread pools.
    • If you call async callbacks from sync code without proper orchestration, they won’t execute reliably.
  4. Reduce to one direct message exchange

    • Remove FastAPI/Celery/queue wrappers temporarily.
    • Run a single script that creates AssistantAgent, registers the hook exactly once, and calls initiate_chat() directly.
    • If it works there but not in prod wiring, the bug is outside AutoGen.

Prevention

  • Register callbacks on the same concrete agent instance that processes inbound messages.
  • Keep AutoGen version pinned across local dev, CI, and production images.
  • Test one end-to-end chat path in an integration test before shipping any routing or background-job logic.
  • Add logging at registration time and at callback entry so silent failures become obvious fast.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides