Haystack Tutorial (TypeScript): caching embeddings for advanced developers

By Cyprian AaronsUpdated 2026-04-21
haystackcaching-embeddings-for-advanced-developerstypescript

This tutorial shows how to cache embeddings in a Haystack TypeScript pipeline so repeated documents don’t get re-embedded on every run. You need this when you’re indexing large corpora, reprocessing the same files, or paying for OpenAI embeddings and want to stop burning tokens on duplicate work.

What You'll Need

  • Node.js 18+ and npm
  • A TypeScript project with tsconfig.json
  • Haystack TypeScript packages:
    • @haystack-ai/core
    • @haystack-ai/components
  • An embedding provider package compatible with your setup
  • An API key for your embedding model provider
  • A persistent cache store:
    • in-memory for local testing
    • Redis, Postgres, or SQLite for production-style persistence
  • Basic familiarity with Haystack pipelines, components, and documents

Step-by-Step

  1. Start by installing the dependencies and setting up a small TypeScript project. The important part here is that your cache must live outside the process if you want reuse across restarts.
npm init -y
npm install @haystack-ai/core @haystack-ai/components dotenv
npm install -D typescript tsx @types/node
  1. Create a cache component that stores embeddings by a deterministic key. For production, use a real datastore; here we’ll use a simple file-backed cache so the example is executable without extra infrastructure.
// cache.ts
import { readFileSync, writeFileSync, existsSync } from "node:fs";

export type EmbeddingCacheValue = {
  embedding: number[];
  model: string;
};

export class FileEmbeddingCache {
  constructor(private path: string) {}

  private load(): Record<string, EmbeddingCacheValue> {
    if (!existsSync(this.path)) return {};
    return JSON.parse(readFileSync(this.path, "utf8"));
  }

  private save(data: Record<string, EmbeddingCacheValue>) {
    writeFileSync(this.path, JSON.stringify(data, null, 2));
  }

  get(key: string): EmbeddingCacheValue | undefined {
    return this.load()[key];
  }

  set(key: string, value: EmbeddingCacheValue) {
    const data = this.load();
    data[key] = value;
    this.save(data);
  }
}
  1. Build a cached embedder wrapper around your actual embedding component. The wrapper checks the cache first and only calls the model when there’s a miss.
// cached-embedder.ts
import crypto from "node:crypto";
import { FileEmbeddingCache } from "./cache";

export type Embedder = {
  embed(texts: string[]): Promise<number[][]>;
  modelName: string;
};

export class CachedEmbedder {
  constructor(
    private embedder: Embedder,
    private cache: FileEmbeddingCache,
  ) {}

  private key(text: string): string {
    return crypto.createHash("sha256").update(`${this.embedder.modelName}:${text}`).digest("hex");
  }

  async embed(texts: string[]): Promise<number[][]> {
    const result: number[][] = [];
    const misses: { index: number; text: string; key: string }[] = [];

    texts.forEach((text, index) => {
      const cached = this.cache.get(this.key(text));
      if (cached) result[index] = cached.embedding;
      else misses.push({ index, text, key: this.key(text) });
    });

    if (misses.length > 0) {
      const fresh = await this.embedder.embed(misses.map((m) => m.text));
      fresh.forEach((embedding, i) => {
        const miss = misses[i];
        this.cache.set(miss.key, { embedding, model: this.embedder.modelName });
        result[miss.index] = embedding;
      });
    }

    return result;
  }
}
  1. Wire the cached embedder into a Haystack pipeline. This example uses real Haystack imports for document creation and pipeline composition; replace the mock embedder with your provider-specific component when you move to production.
// index.ts
import "dotenv/config";
import { Pipeline } from "@haystack-ai/core";
import { Document } from "@haystack-ai/core";
import { FileEmbeddingCache } from "./cache";
import { CachedEmbedder } from "./cached-embedder";

class MockEmbedder {
  modelName = "mock-embedding-model";

  async embed(texts: string[]): Promise<number[][]> {
    return texts.map((text) =>
      Array.from({ length: 8 }, (_, i) => (text.charCodeAt(i % text.length) || 0) / 255),
    );
  }
}

const cache = new FileEmbeddingCache("./embeddings-cache.json");
const cachedEmbedder = new CachedEmbedder(new MockEmbedder(), cache);

async function main() {
  const pipeline = new Pipeline();
  const docs = [
    new Document({ content: "Haystack caches should be deterministic." }),
    new Document({ content: "Haystack caches should be deterministic." }),
    new Document({ content: "Different content means different embeddings." }),
  ];

  const embeddings = await cachedEmbedder.embed(docs.map((d) => d.content ?? ""));
  console.log("Embeddings:", embeddings.length);
}

main();
  1. Add a second run to prove the cache is working. On the first run you’ll populate the file; on the second run the same inputs should hit disk instead of calling the embedder again.
// verify-cache.ts
import { FileEmbeddingCache } from "./cache";
import { CachedEmbedder } from "./cached-embedder";

class CountingEmbedder {
  modelName = "mock-embedding-model";
  calls = 0;

  async embed(texts: string[]): Promise<number[][]> {
    this.calls += texts.length;
    return texts.map((text) => [text.length]);
  }
}

async function main() {
  const embedder = new CountingEmbedder();
  const cache = new FileEmbeddingCache("./embeddings-cache.json");
  const cached = new CachedEmbedder(embedder, cache);

  await cached.embed(["alpha", "beta", "alpha"]);
  console.log("Model calls:", embedder.calls);
}

main();

Testing It

Run npx tsx index.ts once and confirm it writes embeddings-cache.json. Run it again with the same inputs and confirm you get identical outputs without changing the cache file contents except for timestamps if your implementation adds them.

For a stronger test, swap in CountingEmbedder and check that duplicate strings only trigger one underlying embedding call per unique input. If you later replace FileEmbeddingCache with Redis or Postgres, keep the same hash key strategy so results remain stable across deploys.

A good production check is to compare total embedding requests before and after caching over a realistic batch of documents. If your corpus has repeated boilerplate like legal clauses or policy templates, you should see an immediate drop in model calls.

Next Steps

  • Replace FileEmbeddingCache with Redis for multi-instance deployments.
  • Add TTL-based eviction so stale embeddings can expire when models change.
  • Cache chunk-level embeddings separately from document-level metadata.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides