CrewAI Tutorial (Python): caching embeddings for intermediate developers

By Cyprian AaronsUpdated 2026-04-21

crewaicaching-embeddings-for-intermediate-developerspython

This tutorial shows how to cache embeddings in a CrewAI-powered Python workflow so repeated runs stop paying the embedding cost for the same text. You need this when your agents keep re-processing the same documents, prompts, or knowledge chunks and you want faster runs with lower API spend.

What You'll Need

•Python 3.10+
•crewai
•crewai-tools
•openai
•chromadb
•An OpenAI API key
•A small set of text files or documents to embed
•Basic familiarity with CrewAI agents, tasks, and tools

Install the packages:

pip install crewai crewai-tools openai chromadb

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

•Start by creating a persistent ChromaDB collection for embeddings.
The key idea is persistence: if the collection survives process restarts, you can reuse vectors instead of recomputing them.

import os
import chromadb
from chromadb.config import Settings

PERSIST_DIR = "./chroma_cache"

client = chromadb.PersistentClient(path=PERSIST_DIR)
collection = client.get_or_create_collection(
    name="crew_embeddings",
    metadata={"hnsw:space": "cosine"}
)

print("Collection ready:", collection.name)

•Add a small cache layer that hashes text before embedding it.
This lets you check whether a chunk already exists in the vector store before calling the embedding model again.

import hashlib
from openai import OpenAI

client = OpenAI()

def text_id(text: str) -> str:
    return hashlib.sha256(text.encode("utf-8")).hexdigest()

def get_embedding(text: str):
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

def cache_embedding(text: str):
    doc_id = text_id(text)
    existing = collection.get(ids=[doc_id])

    if existing["ids"]:
        return existing["embeddings"][0], True

    embedding = get_embedding(text)
    collection.add(
        ids=[doc_id],
        documents=[text],
        embeddings=[embedding]
    )
    return embedding, False

•Wire the cache into a CrewAI tool so agents can retrieve embeddings through a normal tool call.
This keeps the caching logic outside the agent prompt and makes it reusable across tasks.

from crewai_tools import tool

@tool("cached_embedding")
def cached_embedding_tool(text: str) -> str:
    """
    Return whether an embedding was reused from cache or newly created.
    """
    _, hit = cache_embedding(text)
    return "cache_hit" if hit else "cache_miss"

•Create an agent and task that uses the tool during execution.
The agent does not need to know how caching works internally; it only needs access to the tool.

from crewai import Agent, Task, Crew, Process

agent = Agent(
    role="Document analyst",
    goal="Process text while minimizing redundant embedding calls",
    backstory="You work on document pipelines where repeated chunks are common.",
    tools=[cached_embedding_tool],
    verbose=True,
)

task = Task(
    description=(
        "Take this paragraph and call the cached_embedding tool on it twice. "
        "Report whether each call was a cache hit or miss."
    ),
    expected_output="A short report showing one miss followed by one hit.",
    agent=agent,
)

crew = Crew(
    agents=[agent],
    tasks=[task],
    process=Process.sequential,
)

result = crew.kickoff(inputs={
    "paragraph": "CrewAI agents often see repeated policy clauses across many documents."
})

print(result)

•Run the same text twice and confirm only the first call generates a new vector.
On the second run, Chroma should return the stored embedding immediately because the SHA-256 ID matches an existing record.

sample_text = "CrewAI agents often see repeated policy clauses across many documents."

first_embedding, first_hit = cache_embedding(sample_text)
second_embedding, second_hit = cache_embedding(sample_text)

print("First call cache hit:", first_hit)
print("Second call cache hit:", second_hit)
print("Embeddings equal:", first_embedding == second_embedding)

Testing It

Run the script once and watch for a cache_miss on the first pass. Then run it again with the same text and confirm you get cache_hit without another embedding request going out.

If you want to be strict, delete your ./chroma_cache directory and rerun to verify cold-start behavior, then rerun again to verify persistence. You should also inspect your OpenAI usage dashboard; repeated runs over identical text should stop increasing embedding calls after the first insert.

A good production check is timing: cached lookups should be much faster than fresh embeddings, especially when you’re processing long document chunks.

Next Steps

•Add normalization before hashing so trivial whitespace changes do not create duplicate embeddings.
•Extend this pattern to cache retrieved document chunks alongside embeddings for full RAG pipelines.
•Swap ChromaDB for Postgres + pgvector if you need centralized caching across multiple services or workers.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit