Haystack Tutorial (TypeScript): caching embeddings for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
haystackcaching-embeddings-for-intermediate-developerstypescript

This tutorial shows how to cache embeddings in a TypeScript Haystack pipeline so repeated document chunks do not hit your embedding model every run. You need this when you are re-indexing the same content, iterating on retrieval logic, or paying for embeddings by the token.

What You'll Need

  • Node.js 18+
  • TypeScript 5+
  • A Haystack TypeScript project installed with:
    • @haystack/core
    • @haystack/openai
    • @haystack/cache
  • An OpenAI API key in OPENAI_API_KEY
  • A working internet connection for the first embedding call
  • A local Redis instance if you want persistent caching across process restarts

Step-by-Step

  1. Set up the project and install the packages. If you already have a Haystack TypeScript app, just add the cache package and make sure your OpenAI provider is configured.
npm install @haystack/core @haystack/openai @haystack/cache
npm install -D typescript tsx @types/node
  1. Create a small helper that builds an embedding pipeline with a cache wrapper. The important part is that the same input text produces the same cache key, so repeated runs can reuse the stored vector.
import { Pipeline } from "@haystack/core";
import { OpenAITextEmbedder } from "@haystack/openai";
import { InMemoryCache } from "@haystack/cache";

const cache = new InMemoryCache();

const embedder = new OpenAITextEmbedder({
  model: "text-embedding-3-small",
  apiKey: process.env.OPENAI_API_KEY!,
});

const pipeline = new Pipeline();
pipeline.addComponent("embedder", embedder);

console.log("Pipeline ready with embedding cache");
  1. Normalize your input before embedding it. This avoids accidental cache misses from whitespace changes, and it matters more than people think when you chunk documents from PDFs or HTML.
function normalizeText(text: string): string {
  return text
    .trim()
    .replace(/\s+/g, " ")
    .replace(/[\u0000-\u001f]/g, "");
}

const rawChunk = "  Hello   world.\nThis is a document chunk. ";
const normalizedChunk = normalizeText(rawChunk);

console.log({ rawChunk, normalizedChunk });
  1. Wrap the embed call with a cache lookup and store the result after a miss. This is the production pattern: check cache first, call the model only if needed, then persist the vector under a deterministic key.
import crypto from "node:crypto";

function embeddingKey(model: string, text: string): string {
  const hash = crypto.createHash("sha256").update(`${model}:${text}`).digest("hex");
  return `embedding:${hash}`;
}

async function getEmbedding(text: string): Promise<number[]> {
  const model = "text-embedding-3-small";
  const key = embeddingKey(model, text);
  const cached = await cache.get<number[]>(key);

  if (cached) {
    return cached;
  }

  const result = await embedder.run({ text });
  const embedding = result.embedding;
  await cache.set(key, embedding);

  return embedding;
}
  1. Use the helper in a document indexing loop. This is where caching pays off: duplicate chunks across runs, retries, or reprocessing jobs will skip expensive API calls.
const chunks = [
  "Haystack helps build LLM pipelines.",
  "Caching embeddings reduces repeated API calls.",
  "Haystack helps build LLM pipelines."
];

async function indexChunks() {
  for (const chunk of chunks) {
    const normalized = normalizeText(chunk);
    const vector = await getEmbedding(normalized);
    console.log(normalized, vector.length);
  }
}

indexChunks().catch((err) => {
  console.error(err);
  process.exit(1);
});
  1. Swap in Redis when you need persistence beyond one process. In-memory caching is fine for local development, but Redis is what you want once multiple workers or deployments are involved.
import { RedisCache } from "@haystack/cache";

const redisCache = new RedisCache({
  url: process.env.REDIS_URL ?? "redis://localhost:6379",
});

async function getPersistentEmbedding(text: string): Promise<number[]> {
  const model = "text-embedding-3-small";
  const key = embeddingKey(model, text);
  const cached = await redisCache.get<number[]>(key);

  if (cached) return cached;

  const result = await embedder.run({ text });
  await redisCache.set(key, result.embedding);

  return result.embedding;
}

Testing It

Run the script twice with the same input set. On the first run, you should see normal latency because each unique chunk gets embedded once; on the second run, repeated chunks should come back much faster because they are served from cache.

If you're using Redis, restart your Node process and run it again. The cached vectors should still be there, which confirms you're not relying on process memory.

For extra confidence, log cache hits and misses around cache.get() and compare them against your input list. You want duplicate chunks to hit cache consistently after normalization.

Next Steps

  • Add TTLs and eviction rules so stale embeddings do not live forever.
  • Cache chunk hashes alongside embeddings so you can skip both splitting and embedding for unchanged documents.
  • Wire this into a vector store ingestion job with batch retries and observability metrics.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides