How to Fix 'duplicate tool calls in production' in LlamaIndex (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
duplicate-tool-calls-in-productionllamaindextypescript

If you’re seeing duplicate tool calls in production in a LlamaIndex TypeScript agent, it usually means the model tried to execute the same tool invocation more than once in a single run. In practice, this shows up when your agent loop is replaying state, your tool handler is not idempotent, or you’re accidentally processing the same assistant message twice.

This is common in production because retries, streaming events, and multi-instance deployments expose bugs that don’t show up in local tests.

The Most Common Cause

The #1 cause is replaying the same LLM response or tool call through your agent loop.

In LlamaIndex TypeScript, this often happens when you manually manage ChatMemoryBuffer, ReActAgent, or FunctionTool execution and accidentally call agent.chat() again with the same conversation state after a partial failure, retry, or stream reconnect.

Here’s the broken pattern:

import { ReActAgent } from "llamaindex";
import { FunctionTool } from "llamaindex";

const getPolicyStatus = FunctionTool.from(
  async ({ policyId }: { policyId: string }) => {
    return `Policy ${policyId} is active`;
  },
  {
    name: "get_policy_status",
    description: "Fetch policy status",
  }
);

const agent = new ReActAgent({
  tools: [getPolicyStatus],
});

async function handleRequest(userMessage: string) {
  // Broken: if this gets retried with the same state,
  // the model may emit the same tool call again.
  const response = await agent.chat({
    message: userMessage,
  });

  return response.response;
}

And here’s the safer pattern:

import { ReActAgent } from "llamaindex";
import { FunctionTool } from "llamaindex";

const getPolicyStatus = FunctionTool.from(
  async ({ policyId }: { policyId: string }) => {
    return `Policy ${policyId} is active`;
  },
  {
    name: "get_policy_status",
    description: "Fetch policy status",
  }
);

const agent = new ReActAgent({
  tools: [getPolicyStatus],
});

async function handleRequest(requestId: string, userMessage: string) {
  // Use a request-scoped idempotency key outside the agent loop.
  // Do not re-run the same conversation turn blindly on retry.
  const response = await agent.chat({
    message: userMessage,
    chatHistory: [],
    metadata: {
      requestId,
    },
  });

  return response.response;
}

The important part is not just “call it once.” It’s making sure your application does not replay the same turn with the same pending tool call. If you’re using streaming, don’t restart the whole agent on reconnect unless you also reset or dedupe state.

Other Possible Causes

Here are the other issues I see most often.

CauseWhat it looks likeFix
Non-idempotent tool executionSame DB write happens twiceAdd request IDs and dedupe at tool boundary
Streaming handler reprocessing chunksTool call emitted twice during SSE/WebSocket reconnectTrack processed tool call IDs
Shared mutable agent state across requestsOne user’s tool call leaks into another requestCreate one agent instance per request or isolate memory
Retry middleware wrapping tool executionHTTP retry repeats an already-executed assistant turnRetry only transport calls, not completed agent turns

1) Non-idempotent tools

If your tool writes to a database or triggers a downstream API, running it twice can produce duplicate effects even if LlamaIndex only sees one logical call.

const createClaim = FunctionTool.from(async ({ claimId }: { claimId: string }) => {
  // Bad if called twice
  await db.claims.insert({ claimId });
  return `Created claim ${claimId}`;
});

Fix it with idempotency:

const createClaim = FunctionTool.from(async ({ claimId }: { claimId: string }) => {
  const existing = await db.claims.findUnique({ claimId });
  if (existing) return `Claim ${claimId} already exists`;

  await db.claims.insert({ claimId });
  return `Created claim ${claimId}`;
});

2) Streaming event duplication

If you consume streamed tokens and tool events separately, you can accidentally process the same event twice after reconnect.

// Bad: replays on reconnect without dedupe
stream.on("tool_call", async (event) => {
  await executeTool(event);
});

Use a processed-event cache:

const seen = new Set<string>();

stream.on("tool_call", async (event) => {
  if (seen.has(event.id)) return;
  seen.add(event.id);

  await executeTool(event);
});

3) Shared memory across concurrent requests

A single shared ChatMemoryBuffer can cause one request to inherit another request’s pending assistant/tool state.

// Bad: shared across all users
const memory = new ChatMemoryBuffer();

const agent = new ReActAgent({
  tools,
  memory,
});

Create isolated memory per request:

async function buildAgent() {
  return new ReActAgent({
    tools,
    memory: new ChatMemoryBuffer(),
  });
}

4) Retry logic around completed turns

If your API layer retries after a timeout, it may resend a request that already executed a tool but didn’t finish returning the final answer.

// Bad: retrying full agent execution
await retry(() => agent.chat({ message }));

Retry only safe boundaries:

await retry(() => fetchModelResponse());
// then continue orchestration once you know no side effect happened yet

How to Debug It

  1. Log every tool call ID and request ID

    • Print the LLM run ID, assistant message ID, and tool call name.
    • You want to confirm whether the duplicate comes from one run or two separate runs.
  2. Check whether the duplicate happens before or after execution

    • If your logs show:
      • tool_call_received
      • tool_executed
      • tool_call_received again
    • then your orchestration is replaying.
    • If you only see one receipt but two side effects, your tool is non-idempotent.
  3. Disable streaming temporarily

    • Run the exact same request in non-streaming mode.
    • If the issue disappears, your problem is usually event replay or reconnect handling.
  4. Add a hard dedupe guard in the tool

    • Use (requestId + toolName + arguments) as a key.
    • If that fixes it immediately, you’ve confirmed duplicate execution rather than bad model output.

Prevention

  • Make every side-effecting tool idempotent.
  • Scope memory and agent instances per request unless you have a deliberate shared-state design.
  • Treat retries as transport concerns, not business-logic retries.
  • Store and dedupe on stable keys like requestId, toolCall.id, and normalized arguments.

If you’re using LlamaIndex TypeScript in production and this error appears sporadically, assume concurrency first. The model is usually not “calling tools twice” by itself; your application is replaying state somewhere in the stack.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides