How to Fix 'callback not firing when scaling' in LlamaIndex (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

callback-not-firing-when-scalingllamaindextypescript

When you see callback not firing when scaling in a LlamaIndex TypeScript app, it usually means your callback handler is attached in one place, but the actual query/indexing work is happening in another. In practice, this shows up when you move from a single-node setup to parallel workers, a larger Document batch, or a custom retriever/query engine where callbacks are not propagated.

The symptom is simple: your code runs, the index returns results, but your CallbackManager events like onRetrieveStart, onLLMEnd, or custom trace handlers never fire.

The Most Common Cause

The #1 cause is creating the index, retriever, or query engine without passing the same CallbackManager instance all the way through.

In LlamaIndex TypeScript, callbacks are not magical global state. If you instantiate a new Settings, ServiceContext, or handler inside a worker or helper function, you can end up with one callback tree at construction time and another at execution time.

Broken vs fixed pattern

Broken	Fixed
Callback manager created in one scope, but index/retriever uses defaults	Same `CallbackManager` passed into every component
Worker builds its own engine	Shared engine/config passed into workers
Handler attached after objects are already constructed	Handler attached before index/query engine creation

// BROKEN
import {
  Document,
  VectorStoreIndex,
  CallbackManager,
  ConsoleCallbackHandler,
} from "llamaindex";

const cb = new CallbackManager([new ConsoleCallbackHandler()]);

async function buildIndex() {
  const docs = [new Document({ text: "Hello world" })];

  // Index created without cb
  return await VectorStoreIndex.fromDocuments(docs);
}

async function run() {
  const index = await buildIndex();

  // Query engine also created without cb
  const queryEngine = index.asQueryEngine();
  const response = await queryEngine.query({
    query: "What is in the document?",
  });

  console.log(response.toString());
}

// FIXED
import {
  Document,
  VectorStoreIndex,
  CallbackManager,
  ConsoleCallbackHandler,
} from "llamaindex";

const cb = new CallbackManager([new ConsoleCallbackHandler()]);

async function buildIndex() {
  const docs = [new Document({ text: "Hello world" })];

  return await VectorStoreIndex.fromDocuments(docs, {
    callbackManager: cb,
  });
}

async function run() {
  const index = await buildIndex();

  const queryEngine = index.asQueryEngine({
    callbackManager: cb,
  });

  const response = await queryEngine.query({
    query: "What is in the document?",
  });

  console.log(response.toString());
}

If you are using a newer LlamaIndex TS API where callbacks hang off Settings, the same rule applies:

import { Settings } from "llamaindex";

Settings.callbackManager = cb;

Set it once at startup, before constructing indexes, retrievers, agents, or tools.

Other Possible Causes

1) Parallel execution creates separate contexts

If you scale with Promise.all, each task may construct its own internal pipeline and lose your shared callback state.

// Problematic
await Promise.all(
  chunks.map(async (chunk) => {
    const index = await VectorStoreIndex.fromDocuments([chunk]);
    return index.asQueryEngine().query({ query: "..." });
  })
);

Fix by building once and reusing the same engine:

const index = await VectorStoreIndex.fromDocuments(chunks, {
  callbackManager: cb,
});

const engine = index.asQueryEngine({ callbackManager: cb });

await Promise.all(
  queries.map((query) => engine.query({ query }))
);

2) You attached the handler too late

If you create the retriever first and set callbacks afterward, those internal references may already be frozen.

const index = await VectorStoreIndex.fromDocuments(docs);
const engine = index.asQueryEngine();

Settings.callbackManager = cb; // too late

Move the setup earlier:

Settings.callbackManager = cb;
const index = await VectorStoreIndex.fromDocuments(docs);
const engine = index.asQueryEngine();

3) A custom retriever or tool drops the callback context

This happens when you wrap LlamaIndex classes and forget to forward config.

class MyRetriever {
  constructor(private baseRetriever: any) {}

  async retrieve(query: string) {
    return this.baseRetriever.retrieve(query); // no callback propagation
  }
}

Pass through options explicitly:

class MyRetriever {
  constructor(private baseRetriever: any) {}

  async retrieve(query: string, options?: { callbackManager?: any }) {
    return this.baseRetriever.retrieve(query, options);
  }
}

4) Mixed package versions between core and integrations

A classic TypeScript failure mode is using mismatched versions of llamaindex and an integration package. You may see errors like:

•TypeError: handler.onEvent is not a function
•Cannot read properties of undefined (reading 'callbackManager')
•callbacks registering but never firing

Check your dependency tree:

npm ls llamaindex @llamaindex/core @llamaindex/openai

Then align versions in package.json so all LlamaIndex packages come from the same release line.

How to Debug It

•
Confirm where the callback manager is created
- •Search for new CallbackManager(...) and Settings.callbackManager.
- •Make sure it runs before any VectorStoreIndex, QueryEngine, or agent construction.
•
Add a noisy handler
- •Use ConsoleCallbackHandler first.
- •If console events do not appear, the issue is propagation, not your custom handler logic.
•
Check whether you are rebuilding objects per request
- •If each request creates a fresh retriever or engine inside a worker, compare that path with your working local path.
- •The bug often appears only under load because object graphs differ.
•
Log resolved config at runtime
- •Print whether your engine sees a callback manager:

console.log("callback manager:", !!(engine as any).callbackManager);
console.log("settings callback:", !!(Settings as any).callbackManager);

If one is true and the other is false, you found the break in propagation.

Prevention

•Set Settings.callbackManager once at app bootstrap, before any LlamaIndex object creation.
•Pass callbackManager explicitly into indexes, retrievers, agents, and query engines instead of assuming inheritance.
•Keep all LlamaIndex packages on compatible versions and verify with npm ls during upgrades.

If you want stable tracing in production, treat callbacks like database connections: create them early, pass them explicitly, and never assume they survive wrapper layers or worker boundaries.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit