How to Fix 'duplicate tool calls when scaling' in AutoGen (Python)

By Cyprian AaronsUpdated 2026-04-21
duplicate-tool-calls-when-scalingautogenpython

What this error actually means

duplicate tool calls when scaling usually shows up when AutoGen sees the same function/tool invocation more than once in a run. In practice, it happens when you add retries, parallel workers, or multiple agent loops and the same message history gets processed twice.

The failure mode is usually not the tool itself. It’s the orchestration around AssistantAgent, UserProxyAgent, or your custom executor re-sending the same assistant message with tool_calls attached.

The Most Common Cause

The #1 cause is reusing the same conversation state across multiple runs without clearing messages or deduplicating tool execution. This happens a lot when people scale from one agent pair to a loop, queue, or worker pool.

Here’s the broken pattern:

BrokenFixed
Reuses one shared agent/session stateCreates isolated state per request
Replays the same assistant message with tool_callsExecutes each tool call once
Lets retries re-enter the same conversationGuards against duplicate message processing
# BROKEN: shared state reused across concurrent/scaled requests
from autogen import AssistantAgent, UserProxyAgent

assistant = AssistantAgent(
    name="assistant",
    llm_config={"config_list": [{"model": "gpt-4o-mini"}]},
)

user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
)

# This gets called by multiple workers / requests
def handle_request(prompt: str):
    # Same agents, same chat history, same tool-call context reused
    user_proxy.initiate_chat(assistant, message=prompt)

# If two workers call handle_request() with overlapping state,
# AutoGen may see duplicated tool calls in the same conversation.
# FIXED: isolate per-request state and avoid replaying tool calls
from autogen import AssistantAgent, UserProxyAgent

def build_agents():
    assistant = AssistantAgent(
        name="assistant",
        llm_config={"config_list": [{"model": "gpt-4o-mini"}]},
    )

    user_proxy = UserProxyAgent(
        name="user_proxy",
        human_input_mode="NEVER",
        code_execution_config=False,
    )
    return assistant, user_proxy

def handle_request(prompt: str):
    assistant, user_proxy = build_agents()

    # Fresh agents per request => fresh chat history and tool-call tracking
    user_proxy.initiate_chat(assistant, message=prompt)

If you’re using custom orchestration, the real fix is to treat each run as a new transaction. Don’t share mutable message lists between threads, tasks, or retries.

Other Possible Causes

1) Retry logic resubmits the same assistant response

If your retry wrapper catches an exception and replays the last model output, you can end up executing the same tool_calls twice.

# BAD: replaying the same response after failure
last_message = result.chat_history[-1]
if "tool_calls" in last_message:
    process_tool_calls(last_message)   # first execution
    retry()
    process_tool_calls(last_message)   # duplicate execution

Fix: store an idempotency key per tool call and skip already-processed IDs.

seen_call_ids = set()

for call in last_message.get("tool_calls", []):
    if call["id"] in seen_call_ids:
        continue
    seen_call_ids.add(call["id"])
    process_tool_call(call)

2) Parallel workers consume the same queue item

This is common when scaling with Celery, Ray, asyncio tasks, or multiple API workers. Two consumers read the same job before one marks it complete.

# BAD: no lock / lease on job processing
job = queue.get()
handle_request(job.prompt)
queue.ack(job.id)

Fix: use a lease/lock or ensure exactly-once processing semantics for each conversation turn.

job = queue.reserve_with_lease()
try:
    handle_request(job.prompt)
finally:
    queue.ack(job.id)

3) You append assistant messages back into history manually

If you take an assistant message containing tool_calls and push it back into messages, AutoGen may try to execute it again on the next turn.

# BAD: manual history mutation can duplicate tool invocations
messages.append(assistant_reply)
messages.append(assistant_reply)  # accidental double append

Fix: only append once, and keep raw model output separate from persisted conversation state.

4) Streaming/event handlers fire twice

If you listen to both streaming deltas and final completion events, your handler may process tools on both paths.

# BAD: tool handling in both callbacks
def on_delta(event):
    maybe_process_tools(event)

def on_complete(message):
    maybe_process_tools(message)

Fix: only execute tools from one canonical event source — usually the final assembled assistant message.

How to Debug It

  1. Log every tool call ID

    • Print call["id"], function name, and conversation/request ID.
    • If you see the same id twice, you’ve found duplication upstream.
  2. Check whether history is reused

    • Inspect whether messages, chat_history, or agent instances are global.
    • In AutoGen terms, look for shared AssistantAgent / UserProxyAgent objects across requests.
  3. Turn off concurrency temporarily

    • Run one worker, one thread, one request at a time.
    • If the error disappears, your bug is in scheduling or shared mutable state.
  4. Trace retry paths

    • Search for wrappers around initiate_chat(), model calls, or tool execution.
    • Make sure retries restart from a clean state instead of replaying prior assistant messages with tool_calls.

Prevention

  • Create agents per request unless you have a very deliberate session-store design.
  • Make every tool execution idempotent using a stable call ID or external transaction key.
  • Keep conversation history immutable outside of one orchestration layer.
  • If you scale with workers, enforce single-consumer ownership of each chat turn.

The short version: this is usually not an AutoGen bug. It’s almost always duplicated state, duplicated retries, or duplicated consumers around a single tool_calls payload. Fix that boundary and the error goes away.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides