LangChain Tutorial (TypeScript): caching embeddings for advanced developers
This tutorial shows how to cache embeddings in a TypeScript LangChain app so repeated text chunks do not trigger duplicate embedding calls. You need this when you re-index the same documents often, run batch jobs, or want to reduce OpenAI API cost and latency in production.
What You'll Need
- •Node.js 18+
- •A TypeScript project with
tsconfig.json - •These packages:
- •
langchain - •
@langchain/openai - •
@langchain/community - •
dotenv
- •
- •An OpenAI API key in
.env - •A place to store cache data:
- •local disk for development
- •Redis or a database for production
Install the packages:
npm install langchain @langchain/openai @langchain/community dotenv
npm install -D typescript tsx @types/node
Step-by-Step
- •Create a basic embeddings setup first, then wrap it with a cache-backed store. The important part is that caching happens before the embedding model gets called, so identical text returns the same vector without another API request.
import "dotenv/config";
import { OpenAIEmbeddings } from "@langchain/openai";
const embeddings = new OpenAIEmbeddings({
model: "text-embedding-3-small",
});
async function main() {
const vectors = await embeddings.embedDocuments([
"LangChain caching example",
"LangChain caching example",
]);
console.log(vectors.length);
}
main();
- •Add a persistent cache using SQLite through LangChain's community cache layer. This gives you deterministic reuse across process restarts, which is what you want for indexing pipelines and background workers.
import "dotenv/config";
import { OpenAIEmbeddings } from "@langchain/openai";
import { CacheBackedEmbeddings } from "langchain/embeddings/cache";
import { SQLiteCache } from "@langchain/community/caches/sqlite";
const underlying = new OpenAIEmbeddings({
model: "text-embedding-3-small",
});
const cache = new SQLiteCache("./embeddings-cache.sqlite");
const cachedEmbeddings = CacheBackedEmbeddings.fromBytesStore(underlying, cache);
async function main() {
const docs = ["customer policy summary", "customer policy summary"];
const vectors = await cachedEmbeddings.embedDocuments(docs);
console.log(vectors[0].length);
}
main();
- •Use a namespace so different models or embedding configurations do not collide. If you change models later and keep the same cache key space, you'll silently reuse incompatible vectors.
import "dotenv/config";
import { OpenAIEmbeddings } from "@langchain/openai";
import { CacheBackedEmbeddings } from "langchain/embeddings/cache";
import { SQLiteCache } from "@langchain/community/caches/sqlite";
const underlying = new OpenAIEmbeddings({
model: "text-embedding-3-small",
});
const cache = new SQLiteCache("./embeddings-cache.sqlite");
const cachedEmbeddings = CacheBackedEmbeddings.fromBytesStore(underlying, cache, {
namespace: "openai:text-embedding-3-small:v1",
});
async function main() {
const texts = [
"claims processing workflow",
"claims processing workflow",
"fraud investigation notes",
];
const vectors = await cachedEmbeddings.embedDocuments(texts);
console.log({ count: vectors.length });
}
main();
- •Plug the cached embedder into your retrieval pipeline. In practice, this is where the savings show up because document ingestion often reprocesses unchanged chunks.
import "dotenv/config";
import { OpenAIEmbeddings } from "@langchain/openai";
import { CacheBackedEmbeddings } from "langchain/embeddings/cache";
import { SQLiteCache } from "@langchain/community/caches/sqlite";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
const underlying = new OpenAIEmbeddings({ model: "text-embedding-3-small" });
const cache = new SQLiteCache("./embeddings-cache.sqlite");
const embeddings = CacheBackedEmbeddings.fromBytesStore(underlying, cache, {
namespace: "openai:text-embedding-3-small:v1",
});
async function main() {
const store = await MemoryVectorStore.fromTexts(
["policy renewal date", "billing dispute escalation"],
[{ id: "a" }, { id: "b" }],
embeddings
);
const results = await store.similaritySearch("billing escalation", 1);
console.log(results[0].pageContent);
}
main();
- •For production, keep the cache outside your app container and version your namespace when anything material changes. That includes model name, chunking strategy, normalization rules, or even major prompt changes if you derive embeddings from structured text.
type EmbeddingCacheConfig = {
model: string;
chunkerVersion: string;
};
function buildNamespace(config: EmbeddingCacheConfig) {
return `emb:${config.model}:${config.chunkerVersion}`;
}
console.log(
buildNamespace({
model: "text-embedding-3-small",
chunkerVersion: "v2",
})
);
Testing It
Run the script twice and watch the second execution complete faster with fewer outbound embedding calls. If you're using an API usage dashboard, the token or request count should stop increasing for repeated inputs.
To test correctness, compare similarity search results before and after enabling caching. They should be identical because caching only changes how vectors are retrieved, not how they are used.
If you want a stronger check, delete the SQLite file and rerun once to populate it, then rerun again without deleting it. The second run should reuse previously stored vectors for identical text and namespace values.
Next Steps
- •Move the cache from SQLite to Redis if you need shared caching across multiple workers.
- •Add document hashing so only changed chunks get re-embedded.
- •Combine cached embeddings with a persistent vector store like pgvector or Pinecone for full ingestion pipelines.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit