How to Fix 'duplicate tool calls when scaling' in LlamaIndex (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
duplicate-tool-calls-when-scalingllamaindextypescript

When you see duplicate tool calls when scaling in a LlamaIndex TypeScript app, it usually means the same agent/tool execution path is being triggered more than once for a single user request. In practice, this shows up when you scale to multiple workers, retry a request, or accidentally register the same tool handler twice.

The error often appears alongside LlamaIndex agent classes like OpenAIAgent, ReActAgent, or workflow code that emits tool-call events more than once. The fix is usually not in the model itself — it’s in how you wire state, retries, and event handling.

The Most Common Cause

The #1 cause is non-idempotent tool execution combined with repeated event processing.

A common bad pattern is subscribing to agent/tool events inside a request handler, then re-registering the same listener every time the route runs. When the agent emits toolCall or toolCallResult events, each listener fires, so one model tool call becomes two or more actual executions.

Broken vs fixed

Broken patternFixed pattern
Registers listeners per requestRegisters listeners once at startup
No dedupe on tool call IDUses toolCallId / request-scoped dedupe
Shared mutable state across requestsIsolated per-request execution context
// BROKEN: listener added on every request
import { OpenAIAgent } from "llamaindex";

const agent = new OpenAIAgent({
  tools: [searchTool],
  llm,
});

app.post("/chat", async (req, res) => {
  agent.on("toolCall", async (event) => {
    console.log("toolCall:", event);
    await auditLog(event); // runs multiple times after scaling/retries
  });

  const result = await agent.chat(req.body.message);
  res.json(result);
});
// FIXED: register once, dedupe by toolCallId
import { OpenAIAgent } from "llamaindex";

const agent = new OpenAIAgent({
  tools: [searchTool],
  llm,
});

const seenToolCalls = new Set<string>();

agent.on("toolCall", async (event) => {
  const id = event.toolCallId ?? `${event.name}:${event.args}`;
  if (seenToolCalls.has(id)) return;

  seenToolCalls.add(id);
  await auditLog(event);
});

app.post("/chat", async (req, res) => {
  const result = await agent.chat(req.body.message);
  res.json(result);
});

If you’re using a workflow-based setup, the same rule applies. A Workflow step must be idempotent if it can be replayed after retry or worker failover.

Other Possible Causes

1) You create the agent twice per request

This happens when one instance is used for streaming and another for final response generation.

// BROKEN
const streamAgent = new OpenAIAgent({ tools: [searchTool], llm });
const finalAgent = new OpenAIAgent({ tools: [searchTool], llm });
// FIXED
const agent = new OpenAIAgent({ tools: [searchTool], llm });
// reuse the same instance for both paths

2) Your retry middleware replays the whole tool chain

If you wrap agent.chat() in an automatic retry without checking whether the model already emitted tool calls, you can execute the same action again.

// BROKEN
await retry(async () => {
  return await agent.chat(message);
}, { retries: 3 });
// FIXED
await retry(
  async () => {
    return await agent.chat(message);
  },
  {
    retries: 3,
    onRetry: (err) => console.warn("retrying after:", err.message),
    // make sure downstream tools are idempotent too
  }
);

3) You’re running multiple workers against shared memory/state

With Redis-backed queues or horizontally scaled Node processes, two workers may process the same job if your lock/lease is weak.

// CONFIG SNIPPET: use a distributed lock or unique job key
{
  "queue": "agent-jobs",
  "jobId": "chat:${conversationId}:${turnId}",
  "dedupe": true,
  "lockTTL": 30000
}

If your framework doesn’t support dedupe natively, implement a lock around the turn ID before calling LlamaIndex.

4) You handle streaming chunks as if they were separate tool calls

Some developers parse every streamed delta and trigger side effects on each partial update. That creates duplicate executions because only one final tool call was intended.

// BROKEN
for await (const chunk of stream) {
  if (chunk.toolCall) runTool(chunk.toolCall); // partial chunks cause duplicates
}
// FIXED
let finalizedToolCallId: string | null = null;

for await (const chunk of stream) {
  if (chunk.isFinalToolCall && chunk.toolCallId !== finalizedToolCallId) {
    finalizedToolCallId = chunk.toolCallId;
    await runTool(chunk.toolCall);
  }
}

How to Debug It

  1. Log the tool call ID and request ID

    • Add logs for toolCallId, conversation ID, worker ID, and process PID.
    • If the same toolCallId appears twice with different workers, it’s a scaling/queue issue.
  2. Check where listeners are registered

    • Search for .on("toolCall" , .on("event" , or workflow step hooks.
    • If registration happens inside an HTTP handler or loop, that’s likely your bug.
  3. Turn off retries temporarily

    • Disable HTTP/client retries and queue redelivery.
    • If duplicates disappear, your retry path is replaying side effects instead of just re-fetching results.
  4. Run single-process locally

    • Force one Node process and one worker.
    • If the issue only appears with PM2/Kubernetes/cluster mode, focus on shared state and job dedupe.

Prevention

  • Make every tool execution idempotent.
    • Use a unique key like conversationId + turnId + toolCallId.
  • Register event handlers once at process startup.
  • Treat streamed LLM output as transport data, not as an instruction to execute side effects immediately.
  • If you scale horizontally, put dedupe in Redis or your job queue before calling OpenAIAgent, ReActAgent, or any workflow that can replay steps.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides