How to Fix 'duplicate tool calls in production' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21
duplicate-tool-calls-in-productionllamaindexpython

If you’re seeing ValueError: duplicate tool calls in production or a similar duplicate tool call failure in LlamaIndex, it usually means the same tool invocation is being emitted twice for a single agent turn. In practice, this shows up when an agent loop retries, a callback replays events, or your app accidentally submits the same message history more than once.

This is not a model quality issue. It’s almost always an orchestration bug in your Python code, your streaming handler, or your agent state management.

The Most Common Cause

The #1 cause is calling the agent twice for the same user turn — usually once for streaming and once for the final response, or inside both a webhook handler and a background worker.

A common broken pattern looks like this:

BrokenFixed
Calls the agent twiceCalls the agent once and reuses the result
Rebuilds state on every eventKeeps one request path per turn
# Broken: same user turn triggers two agent executions
from llama_index.core.agent import ReActAgent

agent = ReActAgent.from_tools(tools)

def handle_request(user_input: str):
    # First execution
    stream = agent.stream_chat(user_input)
    for chunk in stream.response_gen:
        print(chunk)

    # Second execution of the same prompt
    response = agent.chat(user_input)
    return response.response
# Fixed: run one execution path only
from llama_index.core.agent import ReActAgent

agent = ReActAgent.from_tools(tools)

def handle_request(user_input: str):
    response = agent.chat(user_input)
    return response.response

If you need streaming, use streaming only:

def handle_request(user_input: str):
    stream = agent.stream_chat(user_input)
    output = []
    for chunk in stream.response_gen:
        output.append(chunk)
        print(chunk)
    return "".join(output)

The key rule: one user message, one agent invocation. If you call chat(), don’t also call stream_chat() for the same payload unless you explicitly want two separate runs.

Other Possible Causes

1) Your frontend retries the same request

If your API gateway, browser client, or job runner retries on timeout, LlamaIndex may receive the same message twice.

# Risky: no idempotency key
requests.post("/chat", json={"message": user_input})
requests.post("/chat", json={"message": user_input})  # retry duplicates tool calls

Fix it by adding request IDs and deduping server-side.

payload = {"message": user_input, "request_id": request_id}

2) You are reusing stale chat history incorrectly

Appending the current user message to memory more than once will make the agent think it needs to act again.

# Broken
memory.put(ChatMessage(role="user", content=user_input))
memory.put(ChatMessage(role="user", content=user_input))
response = agent.chat(user_input)

Use a single source of truth for conversation state.

# Fixed
memory.put(ChatMessage(role="user", content=user_input))
response = agent.chat(user_input, chat_history=memory.get_all())

3) A tool wrapper calls back into the same agent

This happens when a tool function itself invokes agent.chat() or another planner that can emit tool calls. That creates nested orchestration and duplicate tool events.

# Broken: recursive agent call inside a tool
def lookup_customer(query: str):
    return agent.chat(query).response

Tools should do one thing: fetch data, transform data, or call an external service.

# Fixed: pure tool function
def lookup_customer(query: str):
    return crm_client.search(query)

4) Streaming callbacks are replaying events

If you attach multiple handlers to the same stream, each handler may process the same tool event and trigger duplicate side effects.

settings.callback_manager.add_handler(handler_a)
settings.callback_manager.add_handler(handler_b)

Make sure only one handler owns persistence or outbound actions for each event type.

How to Debug It

  1. Log every agent entry point

    • Add logs around chat(), stream_chat(), achat(), and any webhook handler.
    • You want to confirm whether the same prompt is executed once or twice.
  2. Print tool call IDs and event order

    • LlamaIndex emits structured events through its callback system.
    • If you see the same tool_call_id twice in one turn, you’ve found the duplication source.
  3. Disable retries temporarily

    • Turn off HTTP client retries, queue redelivery, and UI auto-resubmit.
    • If the error disappears, your issue is upstream of LlamaIndex.
  4. Strip down to one tool and one handler

    • Remove extra callbacks, memory layers, and wrappers.
    • Start with a single FunctionTool and a single ReActAgent.
    • Add pieces back until duplication returns.

Prevention

  • Keep one execution path per user turn:

    • either chat()
    • or stream_chat()
    • not both for the same input
  • Make tools side-effect free where possible:

    • no nested agent.chat() inside tools
    • no hidden retries inside tool functions
  • Add request-level idempotency:

    • store a request_id
    • ignore repeated submissions from clients or workers

If you’re still stuck, inspect where your app converts one inbound message into multiple LlamaIndex calls. In production, that’s usually where duplicate tool calls start.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides