CrewAI Tutorial (Python): caching embeddings for intermediate developers
This tutorial shows how to cache embeddings in a CrewAI-powered Python workflow so repeated runs stop paying the embedding cost for the same text. You need this when your agents keep re-processing the same documents, prompts, or knowledge chunks and you want faster runs with lower API spend.
What You'll Need
- •Python 3.10+
- •
crewai - •
crewai-tools - •
openai - •
chromadb - •An OpenAI API key
- •A small set of text files or documents to embed
- •Basic familiarity with CrewAI agents, tasks, and tools
Install the packages:
pip install crewai crewai-tools openai chromadb
Set your API key:
export OPENAI_API_KEY="your-key-here"
Step-by-Step
- •Start by creating a persistent ChromaDB collection for embeddings.
The key idea is persistence: if the collection survives process restarts, you can reuse vectors instead of recomputing them.
import os
import chromadb
from chromadb.config import Settings
PERSIST_DIR = "./chroma_cache"
client = chromadb.PersistentClient(path=PERSIST_DIR)
collection = client.get_or_create_collection(
name="crew_embeddings",
metadata={"hnsw:space": "cosine"}
)
print("Collection ready:", collection.name)
- •Add a small cache layer that hashes text before embedding it.
This lets you check whether a chunk already exists in the vector store before calling the embedding model again.
import hashlib
from openai import OpenAI
client = OpenAI()
def text_id(text: str) -> str:
return hashlib.sha256(text.encode("utf-8")).hexdigest()
def get_embedding(text: str):
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
def cache_embedding(text: str):
doc_id = text_id(text)
existing = collection.get(ids=[doc_id])
if existing["ids"]:
return existing["embeddings"][0], True
embedding = get_embedding(text)
collection.add(
ids=[doc_id],
documents=[text],
embeddings=[embedding]
)
return embedding, False
- •Wire the cache into a CrewAI tool so agents can retrieve embeddings through a normal tool call.
This keeps the caching logic outside the agent prompt and makes it reusable across tasks.
from crewai_tools import tool
@tool("cached_embedding")
def cached_embedding_tool(text: str) -> str:
"""
Return whether an embedding was reused from cache or newly created.
"""
_, hit = cache_embedding(text)
return "cache_hit" if hit else "cache_miss"
- •Create an agent and task that uses the tool during execution.
The agent does not need to know how caching works internally; it only needs access to the tool.
from crewai import Agent, Task, Crew, Process
agent = Agent(
role="Document analyst",
goal="Process text while minimizing redundant embedding calls",
backstory="You work on document pipelines where repeated chunks are common.",
tools=[cached_embedding_tool],
verbose=True,
)
task = Task(
description=(
"Take this paragraph and call the cached_embedding tool on it twice. "
"Report whether each call was a cache hit or miss."
),
expected_output="A short report showing one miss followed by one hit.",
agent=agent,
)
crew = Crew(
agents=[agent],
tasks=[task],
process=Process.sequential,
)
result = crew.kickoff(inputs={
"paragraph": "CrewAI agents often see repeated policy clauses across many documents."
})
print(result)
- •Run the same text twice and confirm only the first call generates a new vector.
On the second run, Chroma should return the stored embedding immediately because the SHA-256 ID matches an existing record.
sample_text = "CrewAI agents often see repeated policy clauses across many documents."
first_embedding, first_hit = cache_embedding(sample_text)
second_embedding, second_hit = cache_embedding(sample_text)
print("First call cache hit:", first_hit)
print("Second call cache hit:", second_hit)
print("Embeddings equal:", first_embedding == second_embedding)
Testing It
Run the script once and watch for a cache_miss on the first pass. Then run it again with the same text and confirm you get cache_hit without another embedding request going out.
If you want to be strict, delete your ./chroma_cache directory and rerun to verify cold-start behavior, then rerun again to verify persistence. You should also inspect your OpenAI usage dashboard; repeated runs over identical text should stop increasing embedding calls after the first insert.
A good production check is timing: cached lookups should be much faster than fresh embeddings, especially when you’re processing long document chunks.
Next Steps
- •Add normalization before hashing so trivial whitespace changes do not create duplicate embeddings.
- •Extend this pattern to cache retrieved document chunks alongside embeddings for full RAG pipelines.
- •Swap ChromaDB for Postgres + pgvector if you need centralized caching across multiple services or workers.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit