How to Fix 'duplicate tool calls in production' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-21

duplicate-tool-calls-in-productionlangchainpython

If you’re seeing duplicate tool calls in production, your agent is usually executing the same tool invocation more than once for a single model turn. In LangChain Python, this typically shows up when you mix retry logic, streaming callbacks, or manual tool execution with an agent loop that already handles tool calling.

The error often appears under load, after a deploy, or only with certain models that return tool calls in multiple chunks. The root issue is usually not LangChain itself — it’s your orchestration code calling the tool twice.

The Most Common Cause

The #1 cause is running tools manually while also letting the agent executor run them.

A common mistake is to inspect AIMessage.tool_calls and execute the tool yourself, then pass the same message back into AgentExecutor, which executes it again. Another version is wiring a callback handler that triggers on on_tool_start and also calling the tool in your chain.

Broken vs fixed pattern

Broken pattern	Fixed pattern
You manually invoke the tool and also let `AgentExecutor` handle it	Let one layer own tool execution
Same `tool_call_id` gets processed twice	Tool call is executed once per turn

# BROKEN
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate

@tool
def get_policy_status(policy_id: str) -> str:
    return f"Policy {policy_id} is active"

llm = ChatOpenAI(model="gpt-4o-mini")
tools = [get_policy_status]

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a support agent."),
    ("human", "{input}"),
])

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)

result = executor.invoke({"input": "Check policy 123"})

# BAD: manually re-running tool calls from the model output
# If your code does something like this, you'll duplicate execution.
if "tool_calls" in result:
    for call in result["tool_calls"]:
        get_policy_status.invoke(call["args"])

# FIXED
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate

@tool
def get_policy_status(policy_id: str) -> str:
    return f"Policy {policy_id} is active"

llm = ChatOpenAI(model="gpt-4o-mini")
tools = [get_policy_status]

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a support agent."),
    ("human", "{input}"),
])

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)

# Let AgentExecutor own the full loop.
result = executor.invoke({"input": "Check policy 123"})
print(result["output"])

If you’re using lower-level primitives like ChatOpenAI.bind_tools() and manually looping over AIMessage.tool_calls, the same rule applies: pick one execution path.

Other Possible Causes

1) Retry middleware replays a non-idempotent request

If your chain retries after a timeout, the first request may have already reached the model or your backend. That can replay the same tool call.

# Example: retrying at the wrong layer
chain = runnable.with_retry(stop_after_attempt=3)

Fix by making retries idempotent at the application layer. Store processed tool_call_id values and skip duplicates.

seen_tool_calls = set()

def should_process(tool_call_id: str) -> bool:
    if tool_call_id in seen_tool_calls:
        return False
    seen_tool_calls.add(tool_call_id)
    return True

2) Streaming callbacks fire twice

With streaming models, you may receive partial deltas and then a final message containing the same tool call. If you trigger execution on both events, you’ll duplicate it.

class MyHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs):
        pass  # don't execute tools here

    def on_tool_start(self, serialized, input_str, **kwargs):
        pass  # don't also execute from another hook

Rule: only execute tools after you have a complete AIMessage.tool_calls, not from token-by-token events.

3) Parallel branches are sharing state

If two async tasks process the same conversation state concurrently, both can see the same pending tool call and execute it.

# BAD: shared mutable state across concurrent workers
pending = conversation_state["tool_calls"]
await asyncio.gather(
    worker_a(pending),
    worker_b(pending),
)

Use per-request locking or atomically mark each tool_call_id as claimed before execution.

4) Model/provider emits repeated tool calls

Some providers can emit repeated or malformed tool-call payloads during streaming or structured output transitions. You’ll see messages like:

•ValueError: Duplicate tool call id detected
•ToolExecutionError: Tool call already processed
•InvalidToolCall: expected single call but received multiple identical calls

In these cases, normalize incoming calls before dispatch:

unique_calls = {}
for call in ai_message.tool_calls:
    unique_calls[call["id"]] = call

for call in unique_calls.values():
    run_tool(call)

How to Debug It

•
Log every tool_call_id before execution
- •Add structured logs for message id, run id, and tool call id.
- •If you see the same id twice, your app is replaying it.
•
Check where tools are executed
- •Search for .invoke( on your tool object.
- •Search for callback handlers like on_tool_start, custom loops over AIMessage.tool_calls, and any agent executor running in parallel.
•
Disable retries temporarily
- •Turn off .with_retry(), Celery retries, HTTP retries, and queue redelivery.
- •If duplicates stop immediately, your retry boundary is wrong.
•
Run one request with streaming off
- •Set streaming=False or remove stream handlers.
- •If the issue disappears, your callback path is processing partial and final events as if they were separate calls.

Prevention

•
Make one component own tool execution:
- •either AgentExecutor
- •or your custom loop over AIMessage.tool_calls
- •not both
•
Treat every tool call as idempotent:
- •persist processed tool_call_id
- •reject duplicates at the application boundary
•
Add request-scoped tracing:
- •log LangChain run ids
- •log model response ids
- •log each dispatched tool name and args

If you want a simple rule to keep in mind: LangChain should decide when to ask for a tool; your app should decide whether that exact tool call has already been processed. That separation prevents most duplicate-call bugs before they hit production.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit