How to Fix 'agent infinite loop when scaling' in LlamaIndex (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

agent-infinite-loop-when-scalingllamaindextypescript

Opening

agent infinite loop when scaling usually means your LlamaIndex agent keeps re-entering the same tool-selection or reasoning path until it hits a recursion, iteration, or token limit. In TypeScript, this shows up most often when the agent has no clean stopping condition, a tool returns something the agent interprets as “keep going,” or your multi-step workflow is accidentally feeding its own output back into itself.

The error often appears as one of these symptoms:

•Error: Agent reached max iterations
•Error: Exceeded maximum number of steps
•Tool call loop detected
•Recursive invocation detected in WorkflowAgent

The Most Common Cause

The #1 cause is a tool that returns text the agent treats as another instruction instead of a final answer. In practice, this happens when you wrap an LLM call inside a tool, then let the agent call that same tool again because the response never resolves to a terminal state.

Here’s the broken pattern:

Broken	Fixed
Tool returns an open-ended assistant-style response	Tool returns structured data or a final result
Agent can call the same tool again with no stop condition	Agent gets explicit termination criteria
No guard against repeated queries	Deduplicate inputs / cap iterations

// ❌ Broken: tool output keeps the loop alive
import { FunctionTool, OpenAIAgent } from "llamaindex";

const searchTool = FunctionTool.from(async ({ query }: { query: string }) => {
  const result = await myInternalLLM(query); // another model call
  return `I found this: ${result}`; // looks like more conversation, not a terminal payload
});

const agent = new OpenAIAgent({
  tools: [searchTool],
  systemPrompt: "Answer user questions using tools.",
});

await agent.chat("Find policy details for claim escalation");

// ✅ Fixed: return structured output and force termination
import { FunctionTool, OpenAIAgent } from "llamaindex";

const searchTool = FunctionTool.from(async ({ query }: { query: string }) => {
  const result = await myInternalLLM(query);

  return {
    query,
    result,
    done: true,
  };
});

const agent = new OpenAIAgent({
  tools: [searchTool],
  systemPrompt: `
Use tools only when needed.
If a tool returns done=true, summarize once and stop.
`,
  maxIterations: 4,
});

await agent.chat("Find policy details for claim escalation");

If you are using OpenAIAgent, ReActAgent, or a workflow-based agent in LlamaIndex TS, make sure the tool output is not ambiguous. The model should see either:

•structured JSON
•a clearly terminal answer
•or an explicit done flag

Other Possible Causes

1) Your system prompt encourages endless tool use

If you tell the agent to “keep checking” or “be thorough” without boundaries, it may keep calling tools forever.

const agent = new OpenAIAgent({
  tools,
  systemPrompt: `
Keep searching until you're absolutely certain.
Never stop unless you have perfect confidence.
`,
});

Fix it by adding hard constraints:

const agent = new OpenAIAgent({
  tools,
  systemPrompt: `
Use at most 2 tool calls.
If enough evidence exists, answer directly.
Do not repeat the same query.
`,
});

2) A tool triggers the agent again indirectly

This is common in scaling setups where one service calls another agent endpoint. You think you’re calling a retrieval function, but that function calls back into the same orchestrator.

// ❌ Broken: indirect recursive invocation
const lookupTool = FunctionTool.from(async ({ id }: { id: string }) => {
  return await fetch("https://api.internal/agent-answer", {
    method: "POST",
    body: JSON.stringify({ id }),
  }).then(r => r.text());
});

Fix by separating concerns:

// ✅ Fixed: fetch raw data only
const lookupTool = FunctionTool.from(async ({ id }: { id: string }) => {
  return await fetch(`https://api.internal/customer/${id}`).then(r => r.json());
});

3) Your memory grows with every turn and re-injects old context

When conversation history is appended without trimming, the model may keep seeing prior partial answers and continue “correcting” itself.

// ❌ Risky: unlimited chat history
memory.addMessage(userMsg);
memory.addMessage(agentMsg);

Use bounded memory:

// ✅ Keep only recent turns or summarized state
const memory = new ChatMemoryBuffer({
  tokenLimit: 4000,
});

4) Your retrieval layer keeps returning the same chunk

If your retriever always surfaces identical context, the agent can get stuck re-reading and re-answering the same evidence.

const retriever = index.asRetriever({
  similarityTopK: 10,
});

Try reducing repetition and adding filters:

const retriever = index.asRetriever({
  similarityTopK: 3,
});

const responseSynthesizerConfig = {
  // ensure duplicate chunks are collapsed if your pipeline supports it
};

How to Debug It

•
Log every tool call
- •Print tool name, arguments, and returned payload.
- •If you see the same input repeated, you have a loop.
•
Set a hard iteration cap
- •Use maxIterations or equivalent workflow limits.
- •If lowering it from 10 to 3 changes behavior, your stopping logic is weak.
•
Inspect whether any tool calls back into an agent
- •Search for internal HTTP calls to /chat, /agent, or workflow handlers.
- •A single accidental callback is enough to create recursion.
•
Check whether outputs are structured
- •If your tool returns free-form prose like “Here’s what I found...”, switch to JSON.
- •Agents handle explicit fields like done, answer, and sources much better than narrative text.

Example debug wrapper:

const debugTool = FunctionTool.from(async (args) => {
  console.log("[tool:start]", args);

  const result = await realTool(args);

  console.log("[tool:end]", result);
  return result;
});

If you are using workflows, also watch for messages like:

•Workflow exceeded max steps
•Agent worker entered repeated state
•Recursive step detected

Those usually mean state transitions are cycling instead of converging.

Prevention

•
Keep tool outputs structured:
- •Return objects with explicit fields like done, result, and nextAction.
•
Put hard limits in place:
- •Cap iterations, step count, retrieval depth, and memory size.
•
Separate retrieval from orchestration:
- •Tools should fetch data, not call back into agents unless you’ve designed for recursion explicitly.
•
Add loop detection in production:
- •Track repeated (toolName + args) pairs and abort after N repeats.

If you want stable behavior at scale with LlamaIndex TypeScript, treat agents like state machines. Once you remove ambiguity in tool outputs and enforce termination rules, this class of infinite loop disappears fast.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit