How to Fix 'duplicate tool calls in production' in LlamaIndex (Python)
If you’re seeing ValueError: duplicate tool calls in production or a similar duplicate tool call failure in LlamaIndex, it usually means the same tool invocation is being emitted twice for a single agent turn. In practice, this shows up when an agent loop retries, a callback replays events, or your app accidentally submits the same message history more than once.
This is not a model quality issue. It’s almost always an orchestration bug in your Python code, your streaming handler, or your agent state management.
The Most Common Cause
The #1 cause is calling the agent twice for the same user turn — usually once for streaming and once for the final response, or inside both a webhook handler and a background worker.
A common broken pattern looks like this:
| Broken | Fixed |
|---|---|
| Calls the agent twice | Calls the agent once and reuses the result |
| Rebuilds state on every event | Keeps one request path per turn |
# Broken: same user turn triggers two agent executions
from llama_index.core.agent import ReActAgent
agent = ReActAgent.from_tools(tools)
def handle_request(user_input: str):
# First execution
stream = agent.stream_chat(user_input)
for chunk in stream.response_gen:
print(chunk)
# Second execution of the same prompt
response = agent.chat(user_input)
return response.response
# Fixed: run one execution path only
from llama_index.core.agent import ReActAgent
agent = ReActAgent.from_tools(tools)
def handle_request(user_input: str):
response = agent.chat(user_input)
return response.response
If you need streaming, use streaming only:
def handle_request(user_input: str):
stream = agent.stream_chat(user_input)
output = []
for chunk in stream.response_gen:
output.append(chunk)
print(chunk)
return "".join(output)
The key rule: one user message, one agent invocation. If you call chat(), don’t also call stream_chat() for the same payload unless you explicitly want two separate runs.
Other Possible Causes
1) Your frontend retries the same request
If your API gateway, browser client, or job runner retries on timeout, LlamaIndex may receive the same message twice.
# Risky: no idempotency key
requests.post("/chat", json={"message": user_input})
requests.post("/chat", json={"message": user_input}) # retry duplicates tool calls
Fix it by adding request IDs and deduping server-side.
payload = {"message": user_input, "request_id": request_id}
2) You are reusing stale chat history incorrectly
Appending the current user message to memory more than once will make the agent think it needs to act again.
# Broken
memory.put(ChatMessage(role="user", content=user_input))
memory.put(ChatMessage(role="user", content=user_input))
response = agent.chat(user_input)
Use a single source of truth for conversation state.
# Fixed
memory.put(ChatMessage(role="user", content=user_input))
response = agent.chat(user_input, chat_history=memory.get_all())
3) A tool wrapper calls back into the same agent
This happens when a tool function itself invokes agent.chat() or another planner that can emit tool calls. That creates nested orchestration and duplicate tool events.
# Broken: recursive agent call inside a tool
def lookup_customer(query: str):
return agent.chat(query).response
Tools should do one thing: fetch data, transform data, or call an external service.
# Fixed: pure tool function
def lookup_customer(query: str):
return crm_client.search(query)
4) Streaming callbacks are replaying events
If you attach multiple handlers to the same stream, each handler may process the same tool event and trigger duplicate side effects.
settings.callback_manager.add_handler(handler_a)
settings.callback_manager.add_handler(handler_b)
Make sure only one handler owns persistence or outbound actions for each event type.
How to Debug It
- •
Log every agent entry point
- •Add logs around
chat(),stream_chat(),achat(), and any webhook handler. - •You want to confirm whether the same prompt is executed once or twice.
- •Add logs around
- •
Print tool call IDs and event order
- •LlamaIndex emits structured events through its callback system.
- •If you see the same
tool_call_idtwice in one turn, you’ve found the duplication source.
- •
Disable retries temporarily
- •Turn off HTTP client retries, queue redelivery, and UI auto-resubmit.
- •If the error disappears, your issue is upstream of LlamaIndex.
- •
Strip down to one tool and one handler
- •Remove extra callbacks, memory layers, and wrappers.
- •Start with a single
FunctionTooland a singleReActAgent. - •Add pieces back until duplication returns.
Prevention
- •
Keep one execution path per user turn:
- •either
chat() - •or
stream_chat() - •not both for the same input
- •either
- •
Make tools side-effect free where possible:
- •no nested
agent.chat()inside tools - •no hidden retries inside tool functions
- •no nested
- •
Add request-level idempotency:
- •store a
request_id - •ignore repeated submissions from clients or workers
- •store a
If you’re still stuck, inspect where your app converts one inbound message into multiple LlamaIndex calls. In production, that’s usually where duplicate tool calls start.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit