CrewAI Tutorial (Python): caching embeddings for advanced developers
This tutorial shows you how to cache embeddings in a CrewAI workflow so repeated retrieval and tool calls stop burning tokens and latency. You need this when your agents keep re-processing the same documents, especially in regulated environments where knowledge bases are large, stable, and queried often.
What You'll Need
- •Python 3.10+
- •
crewai - •
crewai-tools - •
chromadb - •
openai - •An OpenAI API key set as
OPENAI_API_KEY - •A folder of source documents for your knowledge base
- •Basic CrewAI familiarity: agents, tasks, crews, and tools
Step-by-Step
- •Install the dependencies and set up your environment.
We’ll use ChromaDB as the local vector store because it gives you persistent embedding storage without extra infrastructure.
pip install crewai crewai-tools chromadb openai
export OPENAI_API_KEY="your-key-here"
- •Create a persistent vector store for embeddings.
This is the cache layer. The first run embeds your documents; later runs reuse the stored vectors instead of recomputing them.
from chromadb import PersistentClient
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
client = PersistentClient(path="./chroma_cache")
embedding_fn = OpenAIEmbeddingFunction(
api_key="your-key-here",
model_name="text-embedding-3-small",
)
collection = client.get_or_create_collection(
name="policy_docs",
embedding_function=embedding_fn,
)
- •Load documents once and write them into the collection with stable IDs.
Stable IDs are the important part. If the content hasn’t changed, Chroma will keep using the same stored embedding instead of treating it like new data.
from pathlib import Path
docs_dir = Path("./docs")
documents = []
ids = []
for i, file_path in enumerate(sorted(docs_dir.glob("*.txt"))):
text = file_path.read_text(encoding="utf-8")
documents.append(text)
ids.append(file_path.stem)
existing = collection.get(ids=ids)
missing_ids = [doc_id for doc_id in ids if doc_id not in existing["ids"]]
if missing_ids:
for doc_id in missing_ids:
idx = ids.index(doc_id)
collection.add(
ids=[doc_id],
documents=[documents[idx]],
metadatas=[{"source": f"{doc_id}.txt"}],
)
- •Wrap retrieval in a CrewAI tool so your agent queries cached embeddings instead of raw files.
The agent now gets a fast semantic lookup path, which is what you want for repeated policy or claims questions.
from crewai.tools import BaseTool
from pydantic import Field
class CachedSearchTool(BaseTool):
name: str = "cached_search"
description: str = "Search cached policy embeddings and return the most relevant passages."
query: str = Field(..., description="The search query")
def _run(self, query: str) -> str:
result = collection.query(
query_texts=[query],
n_results=3,
)
docs = result["documents"][0]
sources = result["metadatas"][0]
return "\n\n".join(
f"SOURCE: {meta['source']}\nTEXT: {doc}"
for doc, meta in zip(docs, sources)
)
- •Build the CrewAI agent and task around that tool.
This keeps the LLM focused on reasoning over retrieved context rather than re-reading every document on each request.
from crewai import Agent, Task, Crew, Process
search_tool = CachedSearchTool()
agent = Agent(
role="Policy Analyst",
goal="Answer questions using cached policy knowledge",
backstory="You retrieve relevant passages before answering.",
tools=[search_tool],
verbose=True,
)
task = Task(
description="Answer: What does the policy say about document retention?",
expected_output="A concise answer with cited source passages.",
agent=agent,
)
crew = Crew(
agents=[agent],
tasks=[task],
process=Process.sequential,
)
result = crew.kickoff()
print(result)
- •Add an update path so changed files refresh their embeddings only when needed.
In production, you do not want to rebuild everything on every deploy. Compare content hashes and update only modified documents.
import hashlib
def sha256_text(text: str) -> str:
return hashlib.sha256(text.encode("utf-8")).hexdigest()
for file_path in sorted(docs_dir.glob("*.txt")):
text = file_path.read_text(encoding="utf-8")
doc_id = file_path.stem
content_hash = sha256_text(text)
existing_doc = collection.get(ids=[doc_id], include=["metadatas"])
current_hash = existing_doc["metadatas"][0][0].get("content_hash") if existing_doc["ids"] else None
if current_hash != content_hash:
collection.upsert(
ids=[doc_id],
documents=[text],
metadatas=[{
"source": f"{doc_id}.txt",
"content_hash": content_hash,
}],
)
Testing It
Run the script twice against the same documents. On the first run, Chroma creates and stores embeddings; on the second run, retrieval should be noticeably faster because it is reading from persistent storage instead of rebuilding vectors.
Change one .txt file and run it again. Only that document should be upserted, which tells you your cache invalidation logic is working.
If you want a quick sanity check, ask two semantically similar questions like “What is retention policy?” and “How long do we keep records?” You should see retrieval hit the same cached passages.
Next Steps
- •Add chunking before insertion so large policy files embed at paragraph or section level.
- •Replace local Chroma with a managed vector database if you need multi-node access or stricter operational controls.
- •Add observability around cache hit rate, embedding cost per run, and document refresh frequency so you can prove savings in production.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit