CrewAI Tutorial (Python): caching embeddings for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
crewaicaching-embeddings-for-intermediate-developerspython

This tutorial shows how to cache embeddings in a CrewAI-powered Python workflow so repeated runs stop paying the embedding cost for the same text. You need this when your agents keep re-processing the same documents, prompts, or knowledge chunks and you want faster runs with lower API spend.

What You'll Need

  • Python 3.10+
  • crewai
  • crewai-tools
  • openai
  • chromadb
  • An OpenAI API key
  • A small set of text files or documents to embed
  • Basic familiarity with CrewAI agents, tasks, and tools

Install the packages:

pip install crewai crewai-tools openai chromadb

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

  1. Start by creating a persistent ChromaDB collection for embeddings.
    The key idea is persistence: if the collection survives process restarts, you can reuse vectors instead of recomputing them.
import os
import chromadb
from chromadb.config import Settings

PERSIST_DIR = "./chroma_cache"

client = chromadb.PersistentClient(path=PERSIST_DIR)
collection = client.get_or_create_collection(
    name="crew_embeddings",
    metadata={"hnsw:space": "cosine"}
)

print("Collection ready:", collection.name)
  1. Add a small cache layer that hashes text before embedding it.
    This lets you check whether a chunk already exists in the vector store before calling the embedding model again.
import hashlib
from openai import OpenAI

client = OpenAI()

def text_id(text: str) -> str:
    return hashlib.sha256(text.encode("utf-8")).hexdigest()

def get_embedding(text: str):
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

def cache_embedding(text: str):
    doc_id = text_id(text)
    existing = collection.get(ids=[doc_id])

    if existing["ids"]:
        return existing["embeddings"][0], True

    embedding = get_embedding(text)
    collection.add(
        ids=[doc_id],
        documents=[text],
        embeddings=[embedding]
    )
    return embedding, False
  1. Wire the cache into a CrewAI tool so agents can retrieve embeddings through a normal tool call.
    This keeps the caching logic outside the agent prompt and makes it reusable across tasks.
from crewai_tools import tool

@tool("cached_embedding")
def cached_embedding_tool(text: str) -> str:
    """
    Return whether an embedding was reused from cache or newly created.
    """
    _, hit = cache_embedding(text)
    return "cache_hit" if hit else "cache_miss"
  1. Create an agent and task that uses the tool during execution.
    The agent does not need to know how caching works internally; it only needs access to the tool.
from crewai import Agent, Task, Crew, Process

agent = Agent(
    role="Document analyst",
    goal="Process text while minimizing redundant embedding calls",
    backstory="You work on document pipelines where repeated chunks are common.",
    tools=[cached_embedding_tool],
    verbose=True,
)

task = Task(
    description=(
        "Take this paragraph and call the cached_embedding tool on it twice. "
        "Report whether each call was a cache hit or miss."
    ),
    expected_output="A short report showing one miss followed by one hit.",
    agent=agent,
)

crew = Crew(
    agents=[agent],
    tasks=[task],
    process=Process.sequential,
)

result = crew.kickoff(inputs={
    "paragraph": "CrewAI agents often see repeated policy clauses across many documents."
})

print(result)
  1. Run the same text twice and confirm only the first call generates a new vector.
    On the second run, Chroma should return the stored embedding immediately because the SHA-256 ID matches an existing record.
sample_text = "CrewAI agents often see repeated policy clauses across many documents."

first_embedding, first_hit = cache_embedding(sample_text)
second_embedding, second_hit = cache_embedding(sample_text)

print("First call cache hit:", first_hit)
print("Second call cache hit:", second_hit)
print("Embeddings equal:", first_embedding == second_embedding)

Testing It

Run the script once and watch for a cache_miss on the first pass. Then run it again with the same text and confirm you get cache_hit without another embedding request going out.

If you want to be strict, delete your ./chroma_cache directory and rerun to verify cold-start behavior, then rerun again to verify persistence. You should also inspect your OpenAI usage dashboard; repeated runs over identical text should stop increasing embedding calls after the first insert.

A good production check is timing: cached lookups should be much faster than fresh embeddings, especially when you’re processing long document chunks.

Next Steps

  • Add normalization before hashing so trivial whitespace changes do not create duplicate embeddings.
  • Extend this pattern to cache retrieved document chunks alongside embeddings for full RAG pipelines.
  • Swap ChromaDB for Postgres + pgvector if you need centralized caching across multiple services or workers.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides