How to Fix 'duplicate tool calls when scaling' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21

duplicate-tool-calls-when-scalingllamaindexpython

When LlamaIndex starts throwing duplicate tool calls during scale-up, it usually means the same agent run is being executed more than once, or the tool-call state is being reused across requests. In practice, this shows up when you add concurrency, background workers, retries, or a web server that reuses objects across requests.

The exact failure often looks like one of these:

•ValueError: duplicate tool call detected
•RuntimeError: Tool call already exists for this run
•OpenAIResponseError / ValidationError around repeated tool_calls

The Most Common Cause

The #1 cause is reusing the same agent, chat memory, or workflow state across multiple concurrent requests.

This happens a lot with:

•global singletons in FastAPI
•cached AgentRunner / ReActAgent instances
•shared ChatMemoryBuffer
•retry logic that replays the same request without resetting state

Broken pattern vs fixed pattern

Broken	Fixed
One shared agent instance for all requests	Create a fresh agent/run context per request
Shared memory object mutated concurrently	Per-request memory/session isolation
Same tool invocation retried without idempotency	Guard retries and reset run state

# BROKEN: shared mutable agent state across requests
from fastapi import FastAPI
from llama_index.core.agent import ReActAgent
from llama_index.core.memory import ChatMemoryBuffer

app = FastAPI()

memory = ChatMemoryBuffer.from_defaults(token_limit=4000)
agent = ReActAgent.from_tools(tools=my_tools, memory=memory)

@app.post("/chat")
async def chat(payload: dict):
    # Under load, multiple requests hit the same agent + memory.
    return await agent.achat(payload["message"])

# FIXED: create isolated state per request
from fastapi import FastAPI
from llama_index.core.agent import ReActAgent
from llama_index.core.memory import ChatMemoryBuffer

app = FastAPI()

@app.post("/chat")
async def chat(payload: dict):
    memory = ChatMemoryBuffer.from_defaults(token_limit=4000)
    agent = ReActAgent.from_tools(tools=my_tools, memory=memory)
    return await agent.achat(payload["message"])

If you need persistence, scope it by session/user ID instead of sharing one in-memory object across everyone.

# Better: session-scoped memory lookup
session_memory_store = {}

def get_memory(session_id: str):
    if session_id not in session_memory_store:
        session_memory_store[session_id] = ChatMemoryBuffer.from_defaults(token_limit=4000)
    return session_memory_store[session_id]

Other Possible Causes

1) Retrying the same request at the HTTP layer

If your client retries after a timeout, the first run may still be executing while the second run starts. That can produce duplicate tool calls if your backend isn’t idempotent.

# BAD: naive retry can replay the same tool execution
for _ in range(3):
    result = await agent.achat(message)

# BETTER: add an idempotency key per user action
request_id = payload["request_id"]
if already_processed(request_id):
    return get_cached_result(request_id)

result = await agent.achat(message)
store_result(request_id, result)

2) Concurrent task fan-out calling the same agent instance

A common scaling bug is using asyncio.gather() against one shared agent.

# BAD
results = await asyncio.gather(
    agent.achat("query 1"),
    agent.achat("query 2"),
)

Use separate instances or serialize access.

# BETTER
async def run_query(q: str):
    local_agent = ReActAgent.from_tools(tools=my_tools)
    return await local_agent.achat(q)

results = await asyncio.gather(*(run_query(q) for q in queries))

3) Duplicate tool registration

If the same tool gets added twice to an ObjectIndex, ToolRetrieverRouterQueryEngine, or agent constructor path, LlamaIndex may emit repeated calls or conflicting metadata.

# BAD: same tool included twice via composition bug
tools = [search_tool, search_tool]
agent = ReActAgent.from_tools(tools=tools)

# FIX: dedupe by name before passing tools in
unique_tools = {tool.metadata.name: tool for tool in tools}.values()
agent = ReActAgent.from_tools(tools=list(unique_tools))

4) Streaming callback handlers attached more than once

If you mount tracing/callback handlers on every request without clearing them, each token/tool event can be processed multiple times.

# BAD: repeated handler registration on a long-lived object
llm.callback_manager.add_handler(MyHandler())
llm.callback_manager.add_handler(MyHandler())

Make handler registration part of startup only, or ensure you don’t append duplicates.

How to Debug It

•
Check whether the error appears only under concurrency
- •Run one request at a time.
- •Then run 10 parallel requests.
- •If it only fails under load, suspect shared mutable state.
•
Log request IDs and tool-call IDs
- •Print the incoming request_id, session ID, and any LLM/tool call identifiers.
- •If two different HTTP requests share the same internal run context, you found your bug.
•
Inspect object lifetimes
- •
  Search for module-level instances like:
  - •agent = ...
  - •memory = ...
  - •query_engine = ...
- •These are fine for stateless components, not for per-run conversational state.
•
Turn off retries and streaming temporarily
- •Disable client retries.
- •Disable async fan-out.
- •Disable streaming callbacks.
- •Re-test with a single synchronous path to isolate where duplication starts.

Prevention

•Treat agents and conversation memory as request-scoped unless you have explicit session isolation.
•Make every external action idempotent using a request ID or workflow ID.
•Keep tool registration centralized so you don’t accidentally add the same tool twice in different startup paths.
•If you scale with workers or async tasks, never share mutable LlamaIndex runtime objects across concurrent runs unless they’re documented as safe.

If you’re still seeing duplicate tool call detected after isolating state, the next place to look is your orchestration layer — Celery retries, background jobs, or webhook redelivery are usually where the second execution sneaks in.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit