How to Fix 'duplicate tool calls in production' in LangChain (Python)
If you’re seeing duplicate tool calls in production, your agent is usually executing the same tool invocation more than once for a single model turn. In LangChain Python, this typically shows up when you mix retry logic, streaming callbacks, or manual tool execution with an agent loop that already handles tool calling.
The error often appears under load, after a deploy, or only with certain models that return tool calls in multiple chunks. The root issue is usually not LangChain itself — it’s your orchestration code calling the tool twice.
The Most Common Cause
The #1 cause is running tools manually while also letting the agent executor run them.
A common mistake is to inspect AIMessage.tool_calls and execute the tool yourself, then pass the same message back into AgentExecutor, which executes it again. Another version is wiring a callback handler that triggers on on_tool_start and also calling the tool in your chain.
Broken vs fixed pattern
| Broken pattern | Fixed pattern |
|---|---|
You manually invoke the tool and also let AgentExecutor handle it | Let one layer own tool execution |
Same tool_call_id gets processed twice | Tool call is executed once per turn |
# BROKEN
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate
@tool
def get_policy_status(policy_id: str) -> str:
return f"Policy {policy_id} is active"
llm = ChatOpenAI(model="gpt-4o-mini")
tools = [get_policy_status]
prompt = ChatPromptTemplate.from_messages([
("system", "You are a support agent."),
("human", "{input}"),
])
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)
result = executor.invoke({"input": "Check policy 123"})
# BAD: manually re-running tool calls from the model output
# If your code does something like this, you'll duplicate execution.
if "tool_calls" in result:
for call in result["tool_calls"]:
get_policy_status.invoke(call["args"])
# FIXED
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate
@tool
def get_policy_status(policy_id: str) -> str:
return f"Policy {policy_id} is active"
llm = ChatOpenAI(model="gpt-4o-mini")
tools = [get_policy_status]
prompt = ChatPromptTemplate.from_messages([
("system", "You are a support agent."),
("human", "{input}"),
])
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)
# Let AgentExecutor own the full loop.
result = executor.invoke({"input": "Check policy 123"})
print(result["output"])
If you’re using lower-level primitives like ChatOpenAI.bind_tools() and manually looping over AIMessage.tool_calls, the same rule applies: pick one execution path.
Other Possible Causes
1) Retry middleware replays a non-idempotent request
If your chain retries after a timeout, the first request may have already reached the model or your backend. That can replay the same tool call.
# Example: retrying at the wrong layer
chain = runnable.with_retry(stop_after_attempt=3)
Fix by making retries idempotent at the application layer. Store processed tool_call_id values and skip duplicates.
seen_tool_calls = set()
def should_process(tool_call_id: str) -> bool:
if tool_call_id in seen_tool_calls:
return False
seen_tool_calls.add(tool_call_id)
return True
2) Streaming callbacks fire twice
With streaming models, you may receive partial deltas and then a final message containing the same tool call. If you trigger execution on both events, you’ll duplicate it.
class MyHandler(BaseCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs):
pass # don't execute tools here
def on_tool_start(self, serialized, input_str, **kwargs):
pass # don't also execute from another hook
Rule: only execute tools after you have a complete AIMessage.tool_calls, not from token-by-token events.
3) Parallel branches are sharing state
If two async tasks process the same conversation state concurrently, both can see the same pending tool call and execute it.
# BAD: shared mutable state across concurrent workers
pending = conversation_state["tool_calls"]
await asyncio.gather(
worker_a(pending),
worker_b(pending),
)
Use per-request locking or atomically mark each tool_call_id as claimed before execution.
4) Model/provider emits repeated tool calls
Some providers can emit repeated or malformed tool-call payloads during streaming or structured output transitions. You’ll see messages like:
- •
ValueError: Duplicate tool call id detected - •
ToolExecutionError: Tool call already processed - •
InvalidToolCall: expected single call but received multiple identical calls
In these cases, normalize incoming calls before dispatch:
unique_calls = {}
for call in ai_message.tool_calls:
unique_calls[call["id"]] = call
for call in unique_calls.values():
run_tool(call)
How to Debug It
- •
Log every
tool_call_idbefore execution- •Add structured logs for message id, run id, and tool call id.
- •If you see the same id twice, your app is replaying it.
- •
Check where tools are executed
- •Search for
.invoke(on your tool object. - •Search for callback handlers like
on_tool_start, custom loops overAIMessage.tool_calls, and any agent executor running in parallel.
- •Search for
- •
Disable retries temporarily
- •Turn off
.with_retry(), Celery retries, HTTP retries, and queue redelivery. - •If duplicates stop immediately, your retry boundary is wrong.
- •Turn off
- •
Run one request with streaming off
- •Set
streaming=Falseor remove stream handlers. - •If the issue disappears, your callback path is processing partial and final events as if they were separate calls.
- •Set
Prevention
- •
Make one component own tool execution:
- •either
AgentExecutor - •or your custom loop over
AIMessage.tool_calls - •not both
- •either
- •
Treat every tool call as idempotent:
- •persist processed
tool_call_id - •reject duplicates at the application boundary
- •persist processed
- •
Add request-scoped tracing:
- •log LangChain run ids
- •log model response ids
- •log each dispatched tool name and args
If you want a simple rule to keep in mind: LangChain should decide when to ask for a tool; your app should decide whether that exact tool call has already been processed. That separation prevents most duplicate-call bugs before they hit production.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit