LlamaIndex Tutorial (TypeScript): caching embeddings for advanced developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexcaching-embeddings-for-advanced-developerstypescript

This tutorial shows you how to cache LlamaIndex embeddings in TypeScript so repeated document ingestion does not keep paying the embedding cost. You need this when you re-run indexing jobs, rebuild vector stores, or process overlapping datasets and want deterministic reuse instead of re-embedding the same text.

What You'll Need

  • Node.js 18+ and npm
  • A TypeScript project with tsconfig.json
  • Packages:
    • llamaindex
    • dotenv
  • An embedding provider API key:
    • OPENAI_API_KEY for OpenAI embeddings
  • A writable local directory for cache files
  • Basic familiarity with VectorStoreIndex, Document, and async/await

Step-by-Step

  1. Install the dependencies and set up your environment variables.
    The cache in this tutorial is a simple file-backed map keyed by normalized text, which is enough for most internal pipelines and easy to debug.
npm install llamaindex dotenv
npm install -D typescript tsx @types/node
OPENAI_API_KEY=your_openai_key_here
EMBED_CACHE_PATH=./.cache/embeddings.json
  1. Create a cached embedder wrapper.
    LlamaIndex will call this wrapper instead of calling the provider directly, so identical chunks return cached vectors immediately.
import fs from "node:fs";
import path from "node:path";
import crypto from "node:crypto";
import { OpenAIEmbedding } from "llamaindex";

type EmbeddingCache = Record<string, number[]>;

export class CachedOpenAIEmbedding {
  private embedder = new OpenAIEmbedding({ model: "text-embedding-3-small" });
  private cachePath: string;
  private cache: EmbeddingCache = {};

  constructor(cachePath: string) {
    this.cachePath = cachePath;
    const dir = path.dirname(cachePath);
    if (!fs.existsSync(dir)) fs.mkdirSync(dir, { recursive: true });
    if (fs.existsSync(cachePath)) {
      this.cache = JSON.parse(fs.readFileSync(cachePath, "utf8")) as EmbeddingCache;
    }
  }

  private keyFor(text: string): string {
    const normalized = text.trim().replace(/\s+/g, " ");
    return crypto.createHash("sha256").update(normalized).digest("hex");
  }

  private persist() {
    fs.writeFileSync(this.cachePath, JSON.stringify(this.cache));
  }

  async getTextEmbedding(text: string): Promise<number[]> {
    const key = this.keyFor(text);
    const cached = this.cache[key];
    if (cached) return cached;

    const embedding = await this.embedder.getTextEmbedding(text);
    this.cache[key] = embedding;
    this.persist();
    return embedding;
  }
}
  1. Plug the cached embedder into a real indexing pipeline.
    This example builds a VectorStoreIndex from documents, and the same chunk text will reuse the stored vector on subsequent runs.
import "dotenv/config";
import { Document, VectorStoreIndex } from "llamaindex";
import { CachedOpenAIEmbedding } from "./cached-embedder";

async function main() {
  const cachePath = process.env.EMBED_CACHE_PATH ?? "./.cache/embeddings.json";
  const embedModel = new CachedOpenAIEmbedding(cachePath);

  const docs = [
    new Document({
      text: "LlamaIndex helps you build retrieval pipelines over documents.",
      metadata: { source: "doc-a" },
    }),
    new Document({
      text: "Caching embeddings reduces cost when documents are reprocessed.",
      metadata: { source: "doc-b" },
    }),
  ];

  const index = await VectorStoreIndex.fromDocuments(docs, {
    embedModel,
  });

  console.log(`Indexed ${docs.length} documents`);
}

main();
  1. Make the cache useful across repeated runs and overlapping corpora.
    In production, you want stable chunking plus stable normalization; otherwise tiny formatting changes create cache misses and waste money.
import { Document } from "llamaindex";

const docs = [
  new Document({
    text: `
      Caching embeddings reduces cost.
      Caching embeddings reduces cost.
    `,
    metadata: { customerId: "123" },
  }),
];

function normalizeText(text: string): string {
  return text.trim().replace(/\s+/g, " ");
}

const sample = docs[0].text;
console.log(normalizeText(sample));
  1. Add observability so you know whether the cache is actually working.
    Track hit rate during ingestion; if it stays low, your chunking strategy or normalization rules are probably too unstable.
import fs from "node:fs";

type Stats = {
  hits: number;
  misses: number;
};

export function loadStats(filePath: string): Stats {
  if (!fs.existsSync(filePath)) return { hits: 0, misses: 0 };
  return JSON.parse(fs.readFileSync(filePath, "utf8")) as Stats;
}

export function saveStats(filePath: string, stats: Stats) {
  fs.writeFileSync(filePath, JSON.stringify(stats));
}

Testing It

Run your indexing script twice with the same input documents. On the first run, the cache file should be created and populated; on the second run, embedding calls for identical normalized text should be served from disk instead of hitting the API again.

If you want a quick sanity check, delete one document’s whitespace changes only and rerun. If normalization is correct, you should still get a cache hit for that chunk.

A good production test is to log both total chunks processed and cache hit rate after each job. If hit rate is near zero on repeated datasets, fix chunking stability before tuning anything else.

Next Steps

  • Swap the file-backed map for Redis if multiple workers need shared embedding state.
  • Add TTL or versioned keys when you change embedding models or chunking rules.
  • Extend the same pattern to query embeddings so retrieval requests also benefit from caching.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides