How to Fix 'timeout error when scaling' in LlamaIndex (TypeScript)
What the error means
timeout error when scaling usually shows up when your LlamaIndex TypeScript app is trying to process more work than the current execution path can finish within the configured timeout. In practice, this happens during large ingestion jobs, multi-step query pipelines, or when an agent fans out into too many async calls.
You’ll often see it alongside ResponseError, AbortError, or a provider-specific timeout from OpenAI, Anthropic, or your vector store client.
The Most Common Cause
The #1 cause is batching too much work into a single request or background job without controlling concurrency. In LlamaIndex TS, this usually happens during ingestion when you call VectorStoreIndex.fromDocuments() on a large document set and let everything run at once.
Here’s the broken pattern:
import { Document, VectorStoreIndex } from "llamaindex";
const documents = hugeDocs.map(
(text) => new Document({ text })
);
const index = await VectorStoreIndex.fromDocuments(documents);
And here’s the fixed pattern:
import { Document, VectorStoreIndex } from "llamaindex";
const chunkSize = 50;
const batches = [];
for (let i = 0; i < hugeDocs.length; i += chunkSize) {
batches.push(hugeDocs.slice(i, i + chunkSize));
}
let index: VectorStoreIndex | undefined;
for (const batch of batches) {
const docs = batch.map((text) => new Document({ text }));
if (!index) {
index = await VectorStoreIndex.fromDocuments(docs);
} else {
await index.insertDocuments(docs);
}
}
The key difference is that the fixed version keeps each ingestion step bounded. If your runtime scales workers or serverless execution based on queue depth, this also prevents a single job from running long enough to hit platform limits.
A similar issue happens with agent loops:
// Broken: unbounded tool fan-out
const response = await agent.chat("Analyze all customer files and summarize them");
If that prompt triggers many retrievals and tool calls, you can hit timeouts even if each individual call is fine.
Other Possible Causes
| Cause | Symptom | Fix |
|---|---|---|
| Slow embedding model | Request timed out during ingestion | Use a faster embedding model or batch smaller |
| Vector store latency | Upsert timed out or Query timed out | Increase client timeout and reduce payload size |
| Too many parallel requests | AbortError: The operation was aborted | Add concurrency limits |
| Serverless execution limit | Function stops around the same duration every time | Move ingestion to a background worker |
1) Embedding model is too slow
If you’re using a remote embedding API with large chunks, each document can take long enough to stack up into a timeout.
// Problematic
const settings = {
embedModel: "text-embedding-3-large",
};
Try smaller batches or a cheaper/faster embedding model:
const settings = {
embedModel: "text-embedding-3-small",
};
2) Your vector store client timeout is too low
Some stores default to short request windows. You’ll see errors like:
- •
ResponseError: request timed out - •
TimeoutError: Query exceeded deadline
Example fix for HTTP clients:
const client = new SomeVectorClient({
timeoutMs: 120000,
});
If your store supports retries, enable them too.
3) Too much parallelism in custom code
This is common when developers wrap LlamaIndex calls in Promise.all().
// Broken
await Promise.all(
docs.map((doc) => index.insertDocuments([doc]))
);
That creates a burst of concurrent requests. Limit concurrency instead:
for (const doc of docs) {
await index.insertDocuments([doc]);
}
If you need speed, use a small pool size like 3–5 workers, not unbounded fan-out.
4) Serverless timeout is lower than your job duration
If this runs in Vercel, Lambda, Cloud Run, or similar, the platform may kill the function before LlamaIndex finishes.
export const maxDuration = 30;
If your ingestion takes longer than that, move it to an async worker or queue-backed job runner. Don’t try to squeeze large indexing jobs into request/response handlers.
How to Debug It
- •
Check where it fails
- •If it fails during
fromDocuments()orinsertDocuments(), it’s usually ingestion volume. - •If it fails during
queryEngine.query()oragent.chat(), it’s usually retrieval fan-out or slow upstream APIs.
- •If it fails during
- •
Log timing around each stage
console.time("ingest"); await VectorStoreIndex.fromDocuments(docs); console.timeEnd("ingest");Do the same for embedding, insert, retrieval, and generation. You want the slow stage isolated fast.
- •
Reduce batch size by half
- •If the error disappears at smaller batches, you’ve found a throughput problem.
- •If it still fails on tiny batches, look at provider timeouts or network latency.
- •
Inspect stack traces for the real source Look for classes like:
- •
ResponseError - •
AbortError - •
OpenAIEmbedding - •
VectorStoreIndex - •your specific vector store client
The top-level message often says “timeout error when scaling,” but the nested error tells you whether it’s embeddings, storage, or query execution.
- •
Prevention
- •Keep ingestion jobs small and resumable.
- •Use batch inserts instead of one massive
fromDocuments()call.
- •Use batch inserts instead of one massive
- •Set explicit timeouts in every external client.
- •OpenAI SDK, vector DB client, HTTP fetch wrapper.
- •Avoid unbounded concurrency.
- •Never use raw
Promise.all()over thousands of documents.
- •Never use raw
- •Move heavy indexing off request paths.
- •Use queues, workers, or scheduled jobs for large corpus builds.
If you’re seeing this error repeatedly in production, treat it as an architecture issue first and a tuning issue second. In LlamaIndex TypeScript apps, most timeout problems come from trying to do too much work in one synchronous path.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit