How to Fix 'chain execution stuck when scaling' in LlamaIndex (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

chain-execution-stuck-when-scalingllamaindextypescript

When LlamaIndex TypeScript says your chain execution is “stuck when scaling,” it usually means the pipeline is no longer making forward progress under higher concurrency or larger input volume. In practice, this shows up as pending promises that never resolve, request queues that grow, or an agent/tool loop that keeps waiting on a step that never completes.

This is almost always a lifecycle or concurrency bug, not a model bug. The usual trigger is taking code that works for one request and then running it across multiple parallel requests, workers, or long-running chat sessions.

The Most Common Cause

The #1 cause is reusing a mutable QueryEngine, ChatEngine, or custom chain instance across concurrent requests without isolating per-request state.

In LlamaIndex TypeScript, objects like VectorStoreIndex, QueryEngine, and ChatEngine are often safe to reuse for reads, but your surrounding code may not be. The classic failure mode is shared mutable memory, shared callbacks, or a tool function that waits on another request in the same event loop path.

Broken pattern vs fixed pattern

Broken	Fixed
Reuse one chain instance with mutable state across requests	Create per-request execution context and keep shared index read-only
Await nested calls inside the same chain callback	Return data directly or offload side effects outside the chain
Store request-specific data on `this`	Pass request state as function args

// ❌ Broken: shared mutable state causes "chain execution stuck when scaling"
import { VectorStoreIndex } from "llamaindex";

class SupportChain {
  private currentTicketId: string | null = null;
  private queryEngine: any;

  constructor(private index: VectorStoreIndex) {
    this.queryEngine = index.asQueryEngine();
  }

  async handle(ticketId: string, question: string) {
    this.currentTicketId = ticketId;

    // If two requests hit this at once, currentTicketId gets overwritten.
    const result = await this.queryEngine.query({
      query: `Ticket ${this.currentTicketId}: ${question}`,
    });

    return result.toString();
  }
}

// ✅ Fixed: no shared request state on the instance
import { VectorStoreIndex } from "llamaindex";

class SupportChain {
  private queryEngine: any;

  constructor(private index: VectorStoreIndex) {
    this.queryEngine = index.asQueryEngine();
  }

  async handle(ticketId: string, question: string) {
    const prompt = `Ticket ${ticketId}: ${question}`;

    const result = await this.queryEngine.query({
      query: prompt,
    });

    return result.toString();
  }
}

The key point is simple: keep the index and retriever reusable, but keep request state local to the method. If you need per-request memory, create it inside handle() instead of storing it on the class.

Other Possible Causes

1) Tool/function calls waiting on themselves

If you use an agent with tools and one tool calls back into the same agent path, you can create a deadlock-like loop. This often appears as:

•AgentRunner never returning
•repeated tool invocations
•logs showing the same step over and over

// ❌ Broken
const tools = [
  {
    name: "lookupPolicy",
    call: async (input: string) => {
      // Bad if this routes back into the same agent execution path
      return await agent.chat(input);
    },
  },
];

// ✅ Fixed
const tools = [
  {
    name: "lookupPolicy",
    call: async (input: string) => {
      // Call a lower-level service or retriever directly
      return await policyRetriever.retrieve(input);
    },
  },
];

2) Unbounded concurrency in your worker layer

LlamaIndex may be fine, but your app can overwhelm Node’s event loop or your vector DB client. If you fire off hundreds of queries at once, some requests look “stuck” because downstream services throttle.

// ❌ Broken
await Promise.all(
  tickets.map((t) => supportChain.handle(t.id, t.question))
);

// ✅ Fixed: limit concurrency
import pLimit from "p-limit";

const limit = pLimit(5);

await Promise.all(
  tickets.map((t) =>
    limit(() => supportChain.handle(t.id, t.question))
  )
);

3) Streaming response not consumed

If you start streaming from LlamaIndex but never read the stream fully, the caller waits forever. This happens with chat UIs and SSE endpoints when the response body is opened but not drained.

// ❌ Broken
const stream = await chatEngine.chat({ message });
return stream; // caller doesn't consume tokens properly

// ✅ Fixed
const stream = await chatEngine.chat({ message });
let fullText = "";

for await (const chunk of stream.responseGen) {
  fullText += chunk;
}

return fullText;

4) Misconfigured timeout or retry behavior

A slow upstream embedding model, vector store, or LLM provider can look like a stuck chain if your timeout is too high or retries are infinite.

const settings = {
  timeoutMs: 30000,
  maxRetries: 2,
};

Watch for logs like:

•Error fetching embeddings
•Request timed out
•OpenAI API error
•RateLimitError

If retries are unbounded in your wrapper code, one bad dependency can pin requests indefinitely.

How to Debug It

•
Turn on step-level logging
- •
  Log before and after every major call:
  - •index.asQueryEngine()
  - •queryEngine.query()
  - •tool invocation
  - •streaming consumption
- •If you see “before query” but not “after query,” you know where it hangs.
•
Check whether the hang is single-request or concurrency-related
- •Run one request only.
- •Then run two concurrent requests.
- •Then run ten.
- •If it only breaks under load, suspect shared mutable state or downstream throttling.
•
Remove tools and memory first
- •
  Test plain retrieval:
```
const qe = index.asQueryEngine();
console.log(await qe.query({ query: "test" }));
```
- •If that works, add memory.
- •Then add tools.
- •Then add streaming.
- •The last thing you added is usually the culprit.
•
Inspect promise chains for accidental recursion
- •
  Search for:
  - •agent.chat(...) inside a tool handler
  - •callbacks calling back into the same service class
  - •event handlers that trigger another query synchronously
- •In Node.js these loops don’t always throw; they just stall progress.

Prevention

•Keep LlamaIndex objects mostly stateless at request boundaries.
•Add concurrency limits around batch processing and background jobs.
•Set explicit timeouts on LLM, embedding, and vector store calls.
•Avoid calling an agent from inside one of its own tools.
•Test with parallel load before shipping to production.

If you’re seeing "chain execution stuck when scaling" in TypeScript LlamaIndex code, start by removing shared mutable state and nested agent calls. In real systems, that fixes most cases before you even touch model settings.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit