How to Fix 'async event loop error when scaling' in LlamaIndex (TypeScript)
What this error usually means
If you’re seeing async event loop error when scaling in a LlamaIndex TypeScript app, you’re almost always hitting an event-loop misuse problem under load. In practice, it shows up when you start running many concurrent retrieval or ingestion tasks, or when you mix sync and async code paths in the wrong place.
The usual symptom is one of these runtime failures:
- •
Error: This event loop is already running - •
RangeError: Maximum call stack size exceeded - •
UnhandledPromiseRejectionWarning - •
Node.js process exited with code 1after a burst of parallel LlamaIndex calls
The Most Common Cause
The #1 cause is calling async LlamaIndex methods from a sync wrapper while also fanning out too many concurrent tasks.
In TypeScript, this usually happens with classes like:
- •
VectorStoreIndex - •
StorageContext - •
Document - •
OpenAIEmbedding - •query engines returned by
index.asQueryEngine()
The broken pattern is usually “map + async + no concurrency control”, or worse, trying to force async work into a synchronous function.
| Broken pattern | Fixed pattern |
|---|---|
| Fires too many promises at once | Limits concurrency |
| Hides async inside sync code | Uses explicit async/await |
| Reuses one shared mutable client badly | Creates clean async boundaries |
// ❌ Broken
import { Document, VectorStoreIndex } from "llamaindex";
function ingestDocs(docs: string[]) {
// sync wrapper around async work
docs.map(async (text) => {
const doc = new Document({ text });
const index = await VectorStoreIndex.fromDocuments([doc]);
const engine = index.asQueryEngine();
const result = await engine.query("Summarize this");
console.log(result.toString());
});
}
ingestDocs(new Array(1000).fill("policy text"));
// ✅ Fixed
import { Document, VectorStoreIndex } from "llamaindex";
import pLimit from "p-limit";
async function ingestDocs(docs: string[]) {
const limit = pLimit(5); // keep concurrency bounded
await Promise.all(
docs.map((text) =>
limit(async () => {
const doc = new Document({ text });
const index = await VectorStoreIndex.fromDocuments([doc]);
const engine = index.asQueryEngine();
const result = await engine.query("Summarize this");
console.log(result.toString());
})
)
);
}
await ingestDocs(new Array(1000).fill("policy text"));
Why this works:
- •You keep the whole flow inside an
asyncboundary. - •You avoid launching 1000 uncontrolled promises.
- •You stop overwhelming the Node event loop and downstream embedding/query calls.
Other Possible Causes
1) Mixing CommonJS and ESM incorrectly
LlamaIndex TypeScript packages are sensitive to module setup. If your project uses old CommonJS patterns with modern ESM-only dependencies, you can get weird runtime behavior that looks like event-loop trouble.
// ❌ Broken tsconfig.json
{
"compilerOptions": {
"module": "commonjs",
"target": "es2020"
}
}
// ✅ Safer tsconfig.json
{
"compilerOptions": {
"module": "nodenext",
"moduleResolution": "nodenext",
"target": "es2022"
}
}
Also check your imports:
// ❌ Broken in some setups
const { VectorStoreIndex } = require("llamaindex");
// ✅ Prefer ESM style
import { VectorStoreIndex } from "llamaindex";
2) Creating a new client on every request
If every request constructs a new OpenAI client, embedding model, or vector store connection, scaling gets ugly fast. The event loop starts spending time on setup instead of work.
// ❌ Broken
app.post("/query", async (req, res) => {
const index = await VectorStoreIndex.fromDocuments(req.body.docs);
const engine = index.asQueryEngine();
res.json(await engine.query(req.body.question));
});
// ✅ Better
const indexPromise = VectorStoreIndex.fromDocuments(initialDocs);
app.post("/query", async (req, res) => {
const index = await indexPromise;
const engine = index.asQueryEngine();
res.json(await engine.query(req.body.question));
});
If the index changes often, cache the expensive parts and rebuild on a schedule instead of per request.
3) Not awaiting internal LlamaIndex promises
A common mistake is assuming methods are synchronous because they return helper objects. In reality, methods like fromDocuments() are async and must be awaited before calling query methods.
// ❌ Broken
const index = VectorStoreIndex.fromDocuments(docs);
const engine = index.asQueryEngine();
const answer = await engine.query("What is the policy?");
// ✅ Fixed
const index = await VectorStoreIndex.fromDocuments(docs);
const engine = index.asQueryEngine();
const answer = await engine.query("What is the policy?");
If you skip the await, you’ll often see follow-on errors that look unrelated to the real bug.
4) Running CPU-heavy parsing on the main thread
If you’re chunking huge PDFs or doing custom preprocessing before feeding documents into Document, Node can stall under load. That doesn’t always throw a clean error; sometimes it just looks like async code is broken.
// ❌ Bad idea for large batches
for (const file of files) {
const text = expensiveParsePdf(file);
docs.push(new Document({ text }));
}
Use worker threads or batch the work:
// ✅ Better shape
const texts = await Promise.all(files.map((f) => parsePdfAsync(f)));
const docs = texts.map((text) => new Document({ text }));
How to Debug It
- •
Find the first real stack trace
- •Don’t stop at the top-level message.
- •Look for the first frame inside your code where you call:
- •
VectorStoreIndex.fromDocuments() - •
queryEngine.query() - •document ingestion loops
- •
- •
Check for missing
await- •Search for:
- •
fromDocuments( - •
.query( - •
.asQueryEngine()
- •
- •Verify each async method is awaited before its result is used.
- •Search for:
- •
Reduce concurrency to near zero
- •Temporarily replace:
- •
Promise.all(...) - •
.map(async ...)
- •
- •With a plain sequential loop:
for (const item of items) { await handleItem(item); } - •If the error disappears, you’ve got a scaling/concurrency issue.
- •Temporarily replace:
- •
Isolate module/runtime config
- •Check:
- •Node version (
node --version) - •
"type": "module"inpackage.json - •
tsconfig.jsonmodule settings
- •Node version (
- •Mismatched runtime config can produce errors that only appear once load increases.
- •Check:
Prevention
- •Keep all LlamaIndex calls inside explicit async functions.
- •Put hard concurrency limits on ingestion and query fan-out using tools like
p-limit. - •Reuse expensive clients and indexes instead of rebuilding them per request.
- •Add a small load test before shipping any pipeline that touches embeddings or retrieval.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit