LangChain Tutorial (Python): caching embeddings for beginners
This tutorial shows you how to cache embedding calls in a LangChain Python app so repeated text does not get re-embedded every time. You need this when you are indexing documents, running local experiments, or paying for embeddings by the token and want to stop wasting money on duplicate requests.
What You'll Need
- •Python 3.10+
- •A working OpenAI API key
- •
langchain - •
langchain-openai - •
langchain-community - •
faiss-cpu - •
python-dotenvif you want to keep secrets in a.envfile
Install the packages:
pip install langchain langchain-openai langchain-community faiss-cpu python-dotenv
Step-by-Step
- •Start with a plain embedding model and a tiny document set. The goal is to prove that the same text does not trigger a second embedding request once caching is enabled.
import os
from langchain_openai import OpenAIEmbeddings
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY", "")
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
texts = [
"LangChain helps build LLM applications.",
"Caching embeddings avoids repeated API calls.",
]
- •Wrap the embedding model with LangChain's cache layer. This uses a local SQLite cache under the hood, so the same input text returns from disk instead of hitting the API again.
from langchain_community.embeddings import CacheBackedEmbeddings
from langchain_community.storage import LocalFileStore
store = LocalFileStore("./embedding_cache")
cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
underlying_embeddings=embeddings,
document_embedding_cache=store,
namespace="openai-text-embedding-3-small",
)
- •Embed your texts through the cached wrapper. The first run stores vectors in the cache, and later runs reuse them automatically as long as the text and namespace stay the same.
vectors = cached_embeddings.embed_documents(texts)
print(len(vectors))
print(len(vectors[0]))
print(vectors[0][:5])
- •Run the exact same call again and confirm it still works. You should see identical vectors, but this time they come from the cache instead of a fresh API call.
second_run_vectors = cached_embeddings.embed_documents(texts)
print(second_run_vectors[0][:5])
print(vectors[0] == second_run_vectors[0])
- •Use the cached embeddings inside a vector store pipeline. This is where caching matters most, because document ingestion often reprocesses overlapping content during development or scheduled jobs.
from langchain_community.vectorstores import FAISS
docs = [
"Caching is useful when documents do not change often.",
"FAISS stores vectors for similarity search.",
"LangChain can wrap embedding models with a cache.",
]
db = FAISS.from_texts(docs, embedding=cached_embeddings)
results = db.similarity_search("Why use embedding caching?", k=2)
for doc in results:
print(doc.page_content)
- •If you want to persist the cache between program runs, keep using the same folder path. That gives you repeatable behavior across restarts without rebuilding embeddings every time.
from pathlib import Path
cache_path = Path("./embedding_cache")
cache_path.mkdir(exist_ok=True)
more_texts = [
"Repeated document chunks should not be embedded twice.",
"Persistent caches save money and time.",
]
more_vectors = cached_embeddings.embed_documents(more_texts)
print(f"Cached files stored in: {cache_path.resolve()}")
print(f"Embedded {len(more_vectors)} new texts")
Testing It
Run the script once, then run it again without deleting ./embedding_cache. On the second run, LangChain should reuse cached embeddings for any text it has already seen under the same namespace.
A practical check is to add logging around your OpenAI usage or watch network calls while rerunning embed_documents(). If caching is working, duplicate texts should stop generating fresh embedding requests.
If you change even one character in a text string, it becomes a different cache key and gets embedded again. That is expected and useful because caches should be exact for embeddings.
Next Steps
- •Learn how to combine embedding caches with document chunking so only new chunks get processed.
- •Add a Redis-backed cache if you need sharing across multiple app instances.
- •Pair this with vector store persistence so both embeddings and indexes survive restarts.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit