LlamaIndex Tutorial (TypeScript): caching embeddings for beginners

By Cyprian AaronsUpdated 2026-04-21
llamaindexcaching-embeddings-for-beginnerstypescript

This tutorial shows you how to cache embeddings in a LlamaIndex TypeScript app so repeated document chunks don’t get re-embedded on every run. You need this when your ingestion pipeline is slow, your embedding API costs are climbing, or you want deterministic local development without burning tokens.

What You'll Need

  • Node.js 18+
  • A TypeScript project with ts-node or a build step
  • LlamaIndex TS packages:
    • llamaindex
    • @llamaindex/openai
    • node-cache
  • An OpenAI API key set as OPENAI_API_KEY
  • A folder with text files to index, or a small in-memory dataset for testing

Step-by-Step

  1. Install the dependencies and set up your environment.
    We’ll use node-cache as a simple in-memory embedding cache keyed by chunk text plus model name.
npm install llamaindex @llamaindex/openai node-cache
npm install -D typescript ts-node @types/node
  1. Create a cached embedding model wrapper.
    LlamaIndex lets you pass an embedding model into the index, so we’ll wrap OpenAI embeddings and intercept calls before they hit the API.
import NodeCache from "node-cache";
import { OpenAIEmbedding } from "@llamaindex/openai";
import type { BaseEmbedding, Embedding } from "llamaindex";

const cache = new NodeCache({ stdTTL: 60 * 60 * 24 }); // 24 hours

export class CachedOpenAIEmbedding extends OpenAIEmbedding implements BaseEmbedding {
  async getTextEmbedding(text: string): Promise<Embedding> {
    const key = `text:${this.model}:${text}`;
    const cached = cache.get<Embedding>(key);

    if (cached) return cached;

    const embedding = await super.getTextEmbedding(text);
    cache.set(key, embedding);
    return embedding;
  }
}
  1. Build a small ingestion script that uses the cached embedder.
    This example creates documents in memory, chunks them, and builds a vector index. Run it twice and the second run should reuse cached embeddings for identical chunks.
import { Document, VectorStoreIndex } from "llamaindex";
import { CachedOpenAIEmbedding } from "./CachedOpenAIEmbedding";

async function main() {
  const documents = [
    new Document({ text: "LlamaIndex helps connect LLMs to your data." }),
    new Document({ text: "Caching embeddings reduces repeated API calls." }),
  ];

  const embedModel = new CachedOpenAIEmbedding({
    model: "text-embedding-3-small",
  });

  const index = await VectorStoreIndex.fromDocuments(documents, {
    embedModel,
  });

  console.log("Indexed nodes:", index.indexStruct?.nodesDict ? Object.keys(index.indexStruct.nodesDict).length : "unknown");
}

main().catch(console.error);
  1. Make the cache visible so you can confirm hits and misses.
    In production you usually want Redis or another shared store, but for beginners logging cache behavior makes the flow obvious.
import NodeCache from "node-cache";
import { OpenAIEmbedding } from "@llamaindex/openai";
import type { BaseEmbedding, Embedding } from "llamaindex";

const cache = new NodeCache({ stdTTL: 60 * 60 * 24 });

export class CachedOpenAIEmbedding extends OpenAIEmbedding implements BaseEmbedding {
  async getTextEmbedding(text: string): Promise<Embedding> {
    const key = `text:${this.model}:${text}`;
    const cached = cache.get<Embedding>(key);

    if (cached) {
      console.log("cache hit:", key);
      return cached;
    }

    console.log("cache miss:", key);
    const embedding = await super.getTextEmbedding(text);
    cache.set(key, embedding);
    return embedding;
  }
}
  1. Use the same pattern for query embeddings too.
    If you only cache document embeddings, your retrieval path still pays the API cost every time someone asks a question.
import NodeCache from "node-cache";
import { OpenAIEmbedding } from "@llamaindex/openai";
import type { BaseEmbedding, Embedding } from "llamaindex";

const cache = new NodeCache({ stdTTL: 60 * 60 * 24 });

export class CachedOpenAIEmbedding extends OpenAIEmbedding implements BaseEmbedding {
  async getTextEmbedding(text: string): Promise<Embedding> {
    const key = `text:${this.model}:${text}`;
    const cached = cache.get<Embedding>(key);
    if (cached) return cached;

    const embedding = await super.getTextEmbedding(text);
    cache.set(key, embedding);
    return embedding;
  }

  async getQueryEmbedding(query: string): Promise<Embedding> {
    const key = `query:${this.model}:${query}`;
    const cached = cache.get<Embedding>(key);
    if (cached) return cached;

    const embedding = await super.getQueryEmbedding(query);
    cache.set(key, embedding);
    return embedding;
  }
}

Testing It

Run your ingestion script twice with the same input documents. On the first run, you should see cache miss logs for each unique chunk; on the second run, those same chunks should show cache hit. If you change one document slightly, only that modified chunk should miss the cache and trigger a fresh embedding call.

For a more realistic test, add a query flow with VectorStoreIndex.asQueryEngine() and ask the same question multiple times. The second query should reuse the cached query embedding instead of calling the model again.

Next Steps

  • Replace node-cache with Redis so embeddings survive process restarts and work across multiple instances.
  • Add normalization before caching, such as trimming whitespace or hashing chunk text, to reduce accidental misses.
  • Extend this pattern to other expensive LlamaIndex steps like reranking or summarization.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides