LlamaIndex Tutorial (TypeScript): caching embeddings for intermediate developers
This tutorial shows you how to cache embeddings in a TypeScript LlamaIndex app so repeated document indexing stops paying the embedding cost every run. You need this when you reprocess the same files often, run CI jobs against stable corpora, or want faster local iteration without hammering your embedding provider.
What You'll Need
- •Node.js 18+
- •A TypeScript project with
npmorpnpm - •Packages:
- •
llamaindex - •
dotenv
- •
- •An embedding model API key:
- •
OPENAI_API_KEY
- •
- •A place to persist cache files:
- •local disk is enough for this tutorial
- •Basic familiarity with:
- •
Document - •
VectorStoreIndex - •
OpenAIEmbedding
- •
Step-by-Step
- •Install the dependencies and set up your environment. We’ll use a file-backed cache so embeddings survive process restarts.
npm install llamaindex dotenv
OPENAI_API_KEY=your_key_here
- •Create a small cache layer that stores embeddings by text hash. The important part is that identical text maps to the same cached vector, so repeated runs can skip recomputation.
import fs from "node:fs";
import crypto from "node:crypto";
type EmbeddingCache = Record<string, number[]>;
const CACHE_PATH = "./embedding-cache.json";
export function loadCache(): EmbeddingCache {
if (!fs.existsSync(CACHE_PATH)) return {};
return JSON.parse(fs.readFileSync(CACHE_PATH, "utf8")) as EmbeddingCache;
}
export function saveCache(cache: EmbeddingCache) {
fs.writeFileSync(CACHE_PATH, JSON.stringify(cache, null, 2));
}
export function hashText(text: string): string {
return crypto.createHash("sha256").update(text).digest("hex");
}
- •Wrap LlamaIndex’s embedding model so it checks the cache before calling the provider. This is the core pattern: keep the interface compatible, but intercept calls at the edge.
import "dotenv/config";
import { OpenAIEmbedding } from "llamaindex";
import { hashText, loadCache, saveCache } from "./cache.js";
const cache = loadCache();
const baseEmbedding = new OpenAIEmbedding({
model: "text-embedding-3-small",
});
export async function getCachedEmbedding(text: string): Promise<number[]> {
const key = hashText(text);
if (cache[key]) {
console.log(`cache hit: ${key}`);
return cache[key];
}
console.log(`cache miss: ${key}`);
const embedding = await baseEmbedding.getTextEmbedding(text);
cache[key] = embedding;
saveCache(cache);
return embedding;
}
- •Use the cached embedding function when building your index. For intermediate projects, this usually means precomputing embeddings for documents before creating nodes or storing vectors in your own persistence layer.
import { Document, VectorStoreIndex } from "llamaindex";
import { getCachedEmbedding } from "./cached-embedding.js";
async function main() {
const docs = [
new Document({ text: "LlamaIndex helps connect data to LLMs." }),
new Document({ text: "Caching embeddings reduces repeated API calls." }),
new Document({ text: "TypeScript makes this easy to structure." }),
];
for (const doc of docs) {
const embedding = await getCachedEmbedding(doc.text);
console.log(doc.text.slice(0, 30), embedding.length);
}
const index = await VectorStoreIndex.fromDocuments(docs);
console.log(`indexed ${docs.length} documents`);
}
main();
- •If you want this to be useful in a real app, persist both your document source and the cache key strategy. In practice, you should normalize text before hashing so whitespace-only changes do not create unnecessary misses.
import crypto from "node:crypto";
export function normalizeText(text: string): string {
return text.replace(/\s+/g, " ").trim();
}
export function hashNormalizedText(text: string): string {
const normalized = normalizeText(text);
return crypto.createHash("sha256").update(normalized).digest("hex");
}
- •Add a quick benchmark so you can see the difference between cold and warm runs. On a warm run, most of your time should disappear into disk reads instead of API latency.
import { performance } from "node:perf_hooks";
import { getCachedEmbedding } from "./cached-embedding.js";
async function benchmark() {
const text = "This is a repeated chunk of text used for testing cache behavior.";
const start1 = performance.now();
await getCachedEmbedding(text);
const end1 = performance.now();
const start2 = performance.now();
await getCachedEmbedding(text);
const end2 = performance.now();
console.log(`first run: ${(end1 - start1).toFixed(2)}ms`);
console.log(`second run: ${(end2 - start2).toFixed(2)}ms`);
}
benchmark();
Testing It
Run the script twice with the same input text. The first run should log a cache miss and write an entry to embedding-cache.json; the second run should log a cache hit and return much faster.
If you change only whitespace and keep normalization enabled, you should still get a hit. If you change the actual content, you should see a miss and a new vector stored under a different hash.
For a real check, delete embedding-cache.json, rerun once, then rerun again without changing anything. That gives you a clean cold/warm comparison.
Next Steps
- •Move the cache from JSON on disk to Redis or Postgres if you need shared caching across workers.
- •Add versioning to your cache keys so model upgrades do not mix incompatible vectors.
- •Wire this into ingestion pipelines so chunking + embedding are both deterministic across deployments.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit