How to Fix 'duplicate tool calls when scaling' in LangChain (Python)

By Cyprian AaronsUpdated 2026-04-21
duplicate-tool-calls-when-scalinglangchainpython

When LangChain starts calling the same tool twice during scale-out, it usually means your agent execution is no longer single-threaded or single-owner. The symptom shows up as repeated tool invocations, duplicate side effects, or logs that look like the model “decided” to call the same tool again.

In practice, this usually happens when you reuse mutable agent state across requests, run the same chain in multiple workers without isolation, or let retries replay a non-idempotent tool call.

The Most Common Cause

The #1 cause is shared mutable state in an agent executor, especially when you store conversation history or tool results in a global object and then serve multiple requests concurrently.

A common failure pattern looks like this:

Broken patternFixed pattern
Reusing one AgentExecutor with shared memory for all requestsCreating per-request state or using isolated session memory
Letting one request mutate the same chat_history listCopying state per request or persisting by session key

Broken code

from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.tools import tool
from langchain.memory import ConversationBufferMemory

llm = ChatOpenAI(model="gpt-4o-mini")
memory = ConversationBufferMemory(return_messages=True)

@tool
def lookup_customer(customer_id: str) -> str:
    return f"Customer {customer_id} found"

tools = [lookup_customer]

agent = create_tool_calling_agent(llm, tools)
executor = AgentExecutor(agent=agent, tools=tools, memory=memory)

# In production, this gets hit by multiple concurrent requests
result = executor.invoke({"input": "Find customer 123"})
print(result)

This looks fine locally, then falls apart under load. The same ConversationBufferMemory instance is shared across requests, so one request can see another request’s intermediate tool state and trigger repeated calls.

Fixed code

from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.tools import tool
from langchain.memory import ConversationBufferMemory

llm = ChatOpenAI(model="gpt-4o-mini")

@tool
def lookup_customer(customer_id: str) -> str:
    return f"Customer {customer_id} found"

tools = [lookup_customer]

def build_executor():
    memory = ConversationBufferMemory(return_messages=True)
    agent = create_tool_calling_agent(llm, tools)
    return AgentExecutor(agent=agent, tools=tools, memory=memory)

# Per-request executor/state
executor = build_executor()
result = executor.invoke({"input": "Find customer 123"})
print(result)

If you need persistence, key it by session ID and isolate state per user/request. Do not share one memory object across workers.

Other Possible Causes

1) Retry logic replays a non-idempotent tool

If your HTTP client retries on timeout or 5xx errors, the same tool call can be executed twice. This is common with payment creation, ticket creation, or record insertion tools.

# Bad: tool has side effects and no idempotency key
@tool
def create_claim(payload: str) -> str:
    # writes to DB / external API
    return "claim_created"

Fix it by making the tool idempotent.

@tool
def create_claim(payload: str, idempotency_key: str) -> str:
    # check if key already processed before creating again
    return "claim_created"

2) Duplicate event handlers in async streaming

If you subscribe to both token events and final output events and each path triggers a tool execution wrapper, you can accidentally invoke the same action twice.

# Bad: two handlers both call downstream processing
async for event in chain.astream_events({"input": "..."}, version="v2"):
    if event["event"] == "on_tool_end":
        process_result(event["data"])
    if event["event"] == "on_chain_end":
        process_result(event["data"])

Use one source of truth for side effects.

async for event in chain.astream_events({"input": "..."}, version="v2"):
    if event["event"] == "on_tool_end":
        process_result(event["data"])

3) Parallel workers sharing the same queue message

If Celery, SQS consumers, or background jobs reprocess the same message after visibility timeout expires, LangChain will happily run the tool again because from its perspective it’s a new invocation.

# Bad: no dedupe at job level
def handle_message(message):
    executor.invoke({"input": message["text"]})

Add job-level deduplication before calling LangChain.

def handle_message(message):
    if already_processed(message["id"]):
        return
    mark_processed(message["id"])
    executor.invoke({"input": message["text"]})

4) Tool schema ambiguity causes model to emit duplicate calls

Sometimes the model sees overlapping tool descriptions and emits more than one valid call path. This shows up with ToolCallingAgent, create_openai_tools_agent, or custom tool wrappers that describe the same action twice.

# Bad: two tools with nearly identical purpose/names
tools = [lookup_policy, find_policy]

Make tool names and descriptions distinct.

# Good: clear separation of responsibilities
tools = [lookup_policy_by_number, search_policies_by_customer]

How to Debug It

  1. Turn on LangChain tracing

    • Use LangSmith or verbose mode to inspect whether the duplicate comes from the model or your app.
    • If you see repeated on_tool_start events for the same input inside one run tree, it’s likely agent/state behavior.
    • If you see two separate runs with identical inputs, it’s likely retry/job duplication.
  2. Log request IDs and session IDs

    • Add a correlation ID to every invocation.
    • Confirm whether duplicate calls come from one request or multiple concurrent requests sharing state.
result = executor.invoke(
    {"input": "Find customer 123"},
    config={"configurable": {"session_id": "req-abc-123"}}
)
  1. Check whether your tools are idempotent

    • Any tool that writes data should be safe to call twice.
    • If a second call creates a second record, that’s your bug even if LangChain triggered it.
  2. Disable retries temporarily

    • Turn off HTTP/client retries and job retries.
    • If duplicates disappear, your issue is replay rather than agent logic.

Prevention

  • Keep agent state per request or per session. Never share mutable memory objects across concurrent users.
  • Make every side-effecting tool idempotent with an explicit idempotency_key.
  • Add tracing early: LangSmith traces plus structured logs will tell you whether duplication is happening in the model loop, your worker layer, or your external API layer.
  • Use clear tool boundaries. If two tools do almost the same thing, merge them or rename them so the model doesn’t bounce between them.

If you’re seeing duplicate tool calls when scaling, start by assuming shared state or retries before blaming LangChain itself. In production systems, that’s where this error usually lives.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides