How to Fix 'OOM error during inference during development' in LlamaIndex (TypeScript)
What the error means
OOM error during inference during development usually means your process ran out of memory while LlamaIndex was building embeddings, calling the LLM, or loading too much data into a single Node.js runtime. In TypeScript projects, this often shows up during local dev with ts-node, hot reload, or when indexing a large folder without batching.
The usual symptom is a crash near an embedding call or query call, often with messages like:
- •
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory - •
Error: OOM error during inference - •
Node.js heap out of memory
The Most Common Cause
The #1 cause is loading too much data into memory at once and then asking LlamaIndex to embed or infer over all of it in one shot.
This happens a lot when developers do something like:
- •read every file into an array
- •create one giant
Document[] - •call
VectorStoreIndex.fromDocuments(...)on the whole batch - •run development with the default Node heap
Broken vs fixed pattern
| Broken pattern | Fixed pattern |
|---|---|
| Load everything into memory | Chunk and batch documents |
| Build index from a huge array | Process incrementally |
| Use default Node heap | Increase heap only if needed |
// ❌ Broken
import { VectorStoreIndex, Document } from "llamaindex";
import fs from "node:fs/promises";
import path from "node:path";
async function main() {
const files = await fs.readdir("./data");
const docs = await Promise.all(
files.map(async (file) => {
const text = await fs.readFile(path.join("./data", file), "utf8");
return new Document({ text, metadata: { file } });
})
);
// This can blow up memory during embedding/inference
const index = await VectorStoreIndex.fromDocuments(docs);
const engine = index.asQueryEngine();
const response = await engine.query({ query: "Summarize the policy changes" });
console.log(response.toString());
}
main();
// ✅ Fixed
import { VectorStoreIndex, Document } from "llamaindex";
import fs from "node:fs/promises";
import path from "node:path";
async function main() {
const files = await fs.readdir("./data");
const batchSize = 10;
let index: VectorStoreIndex | undefined;
for (let i = 0; i < files.length; i += batchSize) {
const batch = files.slice(i, i + batchSize);
const docs = [];
for (const file of batch) {
const text = await fs.readFile(path.join("./data", file), "utf8");
docs.push(new Document({ text, metadata: { file } }));
}
// Build smaller batches instead of one huge in-memory load
index = index
? await VectorStoreIndex.fromDocuments(docs, { appendToIndex: true })
: await VectorStoreIndex.fromDocuments(docs);
}
if (!index) throw new Error("No documents found");
const engine = index.asQueryEngine();
const response = await engine.query({ query: "Summarize the policy changes" });
console.log(response.toString());
}
main();
If your version of LlamaIndex does not support appendToIndex, use a persistent vector store and insert batches manually instead of rebuilding everything in one pass.
Other Possible Causes
1) Your chunk size is too large
Large chunks produce huge embeddings and bigger prompt contexts. That increases memory pressure fast.
// Too large
const splitterConfig = {
chunkSize: 4000,
chunkOverlap: 200,
};
Use smaller chunks:
// Better for dev
const splitterConfig = {
chunkSize: 512,
chunkOverlap: 64,
};
2) You are using an oversized model locally
If you run a local model through Ollama, LM Studio, or another runtime, the model itself may be eating most of RAM before LlamaIndex even starts inference.
// Heavy local model for dev box
const llmModel = "llama3.1:70b";
Try a smaller model first:
const llmModel = "llama3.1:8b";
3) You are creating repeated indexes inside a loop
A common mistake is rebuilding the whole index per request or per file change.
for (const request of requests) {
const index = await VectorStoreIndex.fromDocuments(docs);
await index.asQueryEngine().query({ query: request });
}
Build once and reuse:
const index = await VectorStoreIndex.fromDocuments(docs);
for (const request of requests) {
const engine = index.asQueryEngine();
await engine.query({ query: request });
}
4) Your Node process heap is too small
Sometimes the code is fine, but Node’s default heap is not enough for local indexing.
node --max-old-space-size=8192 dist/index.js
For tsx or ts-node, set it there too:
NODE_OPTIONS="--max-old-space-size=8192" npx tsx src/index.ts
How to Debug It
- •
Check where the crash happens
- •If it dies during
VectorStoreIndex.fromDocuments(...), it’s usually document batching or chunking. - •If it dies during
.query(...), it’s usually prompt size or local model memory.
- •If it dies during
- •
Log document counts and chunk sizes
- •Print how many documents you load.
- •Print average text length before indexing.
- •If you see thousands of docs or multi-megabyte chunks, that’s your issue.
- •
Test with a tiny dataset
- •Run the same code on one file.
- •Then ten files.
- •Then your full corpus.
- •If it only fails at scale, you have a batching problem.
- •
Switch to a smaller model and lower chunk size
- •Drop to
chunkSize: 512. - •Use a smaller local model.
- •If memory stabilizes, you’ve confirmed the cause.
- •Drop to
Prevention
- •Batch document ingestion instead of calling
VectorStoreIndex.fromDocuments()on huge arrays. - •Keep dev-time chunk sizes small unless you have a reason not to.
- •Set a realistic Node heap size for local indexing jobs:
NODE_OPTIONS="--max-old-space-size=8192"
If you’re building agents for production systems like banking or insurance workflows, treat indexing as a pipeline step, not an in-request operation. That one design choice avoids most OOM issues before they show up.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit