LangChain Tutorial (Python): caching embeddings for beginners

By Cyprian AaronsUpdated 2026-04-21
langchaincaching-embeddings-for-beginnerspython

This tutorial shows you how to cache embedding calls in a LangChain Python app so repeated text does not get re-embedded every time. You need this when you are indexing documents, running local experiments, or paying for embeddings by the token and want to stop wasting money on duplicate requests.

What You'll Need

  • Python 3.10+
  • A working OpenAI API key
  • langchain
  • langchain-openai
  • langchain-community
  • faiss-cpu
  • python-dotenv if you want to keep secrets in a .env file

Install the packages:

pip install langchain langchain-openai langchain-community faiss-cpu python-dotenv

Step-by-Step

  1. Start with a plain embedding model and a tiny document set. The goal is to prove that the same text does not trigger a second embedding request once caching is enabled.
import os
from langchain_openai import OpenAIEmbeddings

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY", "")

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

texts = [
    "LangChain helps build LLM applications.",
    "Caching embeddings avoids repeated API calls.",
]
  1. Wrap the embedding model with LangChain's cache layer. This uses a local SQLite cache under the hood, so the same input text returns from disk instead of hitting the API again.
from langchain_community.embeddings import CacheBackedEmbeddings
from langchain_community.storage import LocalFileStore

store = LocalFileStore("./embedding_cache")
cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings=embeddings,
    document_embedding_cache=store,
    namespace="openai-text-embedding-3-small",
)
  1. Embed your texts through the cached wrapper. The first run stores vectors in the cache, and later runs reuse them automatically as long as the text and namespace stay the same.
vectors = cached_embeddings.embed_documents(texts)

print(len(vectors))
print(len(vectors[0]))
print(vectors[0][:5])
  1. Run the exact same call again and confirm it still works. You should see identical vectors, but this time they come from the cache instead of a fresh API call.
second_run_vectors = cached_embeddings.embed_documents(texts)

print(second_run_vectors[0][:5])
print(vectors[0] == second_run_vectors[0])
  1. Use the cached embeddings inside a vector store pipeline. This is where caching matters most, because document ingestion often reprocesses overlapping content during development or scheduled jobs.
from langchain_community.vectorstores import FAISS

docs = [
    "Caching is useful when documents do not change often.",
    "FAISS stores vectors for similarity search.",
    "LangChain can wrap embedding models with a cache.",
]

db = FAISS.from_texts(docs, embedding=cached_embeddings)
results = db.similarity_search("Why use embedding caching?", k=2)

for doc in results:
    print(doc.page_content)
  1. If you want to persist the cache between program runs, keep using the same folder path. That gives you repeatable behavior across restarts without rebuilding embeddings every time.
from pathlib import Path

cache_path = Path("./embedding_cache")
cache_path.mkdir(exist_ok=True)

more_texts = [
    "Repeated document chunks should not be embedded twice.",
    "Persistent caches save money and time.",
]

more_vectors = cached_embeddings.embed_documents(more_texts)
print(f"Cached files stored in: {cache_path.resolve()}")
print(f"Embedded {len(more_vectors)} new texts")

Testing It

Run the script once, then run it again without deleting ./embedding_cache. On the second run, LangChain should reuse cached embeddings for any text it has already seen under the same namespace.

A practical check is to add logging around your OpenAI usage or watch network calls while rerunning embed_documents(). If caching is working, duplicate texts should stop generating fresh embedding requests.

If you change even one character in a text string, it becomes a different cache key and gets embedded again. That is expected and useful because caches should be exact for embeddings.

Next Steps

  • Learn how to combine embedding caches with document chunking so only new chunks get processed.
  • Add a Redis-backed cache if you need sharing across multiple app instances.
  • Pair this with vector store persistence so both embeddings and indexes survive restarts.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides