How to Fix 'duplicate tool calls during development' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21
duplicate-tool-calls-during-developmentllamaindexpython

duplicate tool calls during development usually means your agent is receiving the same tool invocation twice, or LlamaIndex is re-processing a message that already contains a tool_call. In practice, this shows up during local dev with hot reload, notebook reruns, Streamlit reruns, or when you accidentally call the agent loop inside another loop.

The key point: this is almost always a state/control-flow bug, not a model bug. The fix is usually to make your tool execution idempotent and ensure you only hand the assistant one active conversation turn at a time.

The Most Common Cause

The #1 cause is re-running the agent on the same chat history while also appending tool results manually. With LlamaIndex agents like ReActAgent, FunctionAgent, or OpenAIAgent, the framework expects to own the tool-call lifecycle.

Here’s the broken pattern:

BrokenFixed
You call the agent twice on the same input, or reuse stale history after a tool has already been executed.You keep a single source of truth for chat state and let LlamaIndex manage tool-call turns.
# BROKEN
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool

def get_balance(account_id: str) -> str:
    return "Balance: $1200"

tool = FunctionTool.from_defaults(fn=get_balance)
agent = ReActAgent.from_tools([tool], verbose=True)

chat_history = []

# First turn
response = agent.chat("What's my balance?")
chat_history.append({"role": "assistant", "content": str(response)})

# Later in dev, you accidentally replay the same prompt/history
response2 = agent.chat("What's my balance?")  # duplicate tool call risk
# FIXED
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool

def get_balance(account_id: str) -> str:
    return "Balance: $1200"

tool = FunctionTool.from_defaults(fn=get_balance)
agent = ReActAgent.from_tools([tool], verbose=True)

# One turn at a time; let the agent manage its own internal reasoning.
response = agent.chat("What's my balance?")
print(response)

If you’re using a UI framework, this gets worse because rerenders can retrigger the same handler. In Streamlit, for example, every widget interaction can rerun the script unless you guard it with session state.

Other Possible Causes

1) Hot reload restarting code that re-registers tools

If your dev server reloads modules, your tool registration may run twice. That can create duplicate callbacks or duplicate agent instances.

# BROKEN: module-level side effects on reload
tool = FunctionTool.from_defaults(fn=get_balance)
agent = ReActAgent.from_tools([tool])

# FIXED: build once behind a guard/factory
def build_agent():
    tool = FunctionTool.from_defaults(fn=get_balance)
    return ReActAgent.from_tools([tool])

2) Manually echoing tool messages back into the next turn

Some developers append raw tool_call / tool_result content into memory and then send it back as if it were user text. That can cause LlamaIndex to see an already-consumed tool request again.

# BROKEN
memory.put({"role": "assistant", "content": '{"tool_calls":[...]}'})
memory.put({"role": "user", "content": "same conversation replayed"})

# FIXED
# Store only valid chat turns in memory.
# Let LlamaIndex's chat engine/agent store internal tool events.

3) Calling both .chat() and .stream_chat() for the same turn

If you start streaming and then also call a normal chat method for rendering, you may execute the same assistant step twice.

# BROKEN
stream = agent.stream_chat("Check policy status")
print(stream.response)
print(agent.chat("Check policy status"))  # duplicate turn

# FIXED
stream = agent.stream_chat("Check policy status")
for chunk in stream.response_gen:
    print(chunk, end="")

4) Tool function has side effects and no deduplication

If your tool writes to a database or external API, repeated execution looks like “duplicate tool calls” even when the model only asked once. This often happens with retries.

# BROKEN: not idempotent
def create_ticket(summary: str) -> str:
    ticket_id = crm.create_ticket(summary)
    return f"Created {ticket_id}"

# FIXED: dedupe by request id / hash
def create_ticket(summary: str, request_id: str) -> str:
    if db.seen(request_id):
        return db.get_result(request_id)
    result = crm.create_ticket(summary)
    db.save(request_id, result)
    return result

How to Debug It

  1. Turn on verbose logging

    • For agents like ReActAgent, set verbose=True.
    • Look for repeated lines like:
      • Thought:
      • Action: get_balance
      • Observation: Balance: $1200
    • If you see the same Action: twice for one user prompt, your app is replaying turns.
  2. Print object identity and call count

    • Check whether your app is creating multiple agent instances.
    • Log id(agent) and any handler invocation count.
print("agent id:", id(agent))
print("request:", user_input)
  1. Inspect your UI/framework reruns

    • Streamlit: use st.session_state to prevent duplicate submits.
    • FastAPI/Flask: make sure one HTTP request maps to one agent call.
    • Jupyter notebooks: avoid re-executing cells that rebuild memory plus resend prompts.
  2. Trace tool execution separately from LLM output

    • Add logging inside each tool function.
    • If the function logs twice but the model prompt appears once, your app is retrying or rerunning.
    • If both prompt and function log twice, you’re replaying conversation state.

Prevention

  • Keep agent creation behind a factory function, not module-level side effects.
  • Make tools idempotent when they touch external systems like CRM, ticketing, or banking APIs.
  • Store chat state in one place only; don’t manually append raw tool events back into user-facing history.
  • In web apps, gate submit handlers with session/request IDs so rerenders don’t trigger duplicate turns.

If you still see errors like ValueError: duplicate tool call detected or repeated tool_calls in debug output from ReActAgent or FunctionAgent, treat it as a lifecycle issue first. In most cases, removing replayed state fixes it faster than changing models or prompts.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides