CrewAI Tutorial (Python): caching embeddings for beginners

By Cyprian AaronsUpdated 2026-04-21

crewaicaching-embeddings-for-beginnerspython

This tutorial shows you how to cache embeddings in a CrewAI-based Python workflow so repeated runs stop re-embedding the same text. You need this when your agents keep reading the same knowledge base, because embedding calls add latency and cost fast.

What You'll Need

•Python 3.10+
•crewai
•crewai-tools
•openai
•An OpenAI API key set as OPENAI_API_KEY
•A local folder for cached files
•Basic familiarity with CrewAI agents, tasks, and tools

Install the packages:

pip install crewai crewai-tools openai

Step-by-Step

•Create a small cache layer that stores embeddings on disk.

This is the simplest production-friendly pattern for beginners: hash the input text, store the vector in a JSON file, and reuse it on later runs. It avoids recomputing embeddings for unchanged content.

import hashlib
import json
from pathlib import Path

CACHE_DIR = Path("embedding_cache")
CACHE_DIR.mkdir(exist_ok=True)
CACHE_FILE = CACHE_DIR / "cache.json"

def load_cache():
    if CACHE_FILE.exists():
        return json.loads(CACHE_FILE.read_text())
    return {}

def save_cache(cache):
    CACHE_FILE.write_text(json.dumps(cache))

def cache_key(text: str, model: str) -> str:
    raw = f"{model}:{text}".encode("utf-8")
    return hashlib.sha256(raw).hexdigest()

•Add a cached embedding function using the OpenAI client.

This function checks disk first, then calls the embedding API only when needed. The return value is always a plain Python list, which makes it easy to use anywhere in your CrewAI app.

from openai import OpenAI

client = OpenAI()

def get_embedding(text: str, model: str = "text-embedding-3-small"):
    cache = load_cache()
    key = cache_key(text, model)

    if key in cache:
        return cache[key]

    response = client.embeddings.create(
        model=model,
        input=text,
    )
    vector = response.data[0].embedding
    cache[key] = vector
    save_cache(cache)
    return vector

•Wrap the cached embedding lookup in a CrewAI tool.

CrewAI agents work best when you expose reusable capabilities as tools. Here we create a simple tool that takes text and returns the embedding length plus a small preview so you can confirm caching is working without dumping huge vectors into logs.

from crewai.tools import tool

@tool("cached_embedding_tool")
def cached_embedding_tool(text: str) -> str:
    """Return a cached embedding for the given text."""
    vector = get_embedding(text)
    return f"Embedding size: {len(vector)} | First 5 values: {vector[:5]}"

•Create an agent and task that use the tool.

The agent does not need to know anything about caching internals. It just calls the tool, and your app handles whether the embedding came from disk or from OpenAI.

from crewai import Agent, Task, Crew, Process

agent = Agent(
    role="Research Assistant",
    goal="Inspect text using cached embeddings",
    backstory="You help verify whether content has already been embedded.",
    tools=[cached_embedding_tool],
    verbose=True,
)

task = Task(
    description="Use the cached_embedding_tool on this text: 'CrewAI caching embeddings saves money.'",
    expected_output="A short summary of the embedding details.",
    agent=agent,
)

crew = Crew(
    agents=[agent],
    tasks=[task],
    process=Process.sequential,
)

•Run it twice and confirm the second run hits the cache.

The first run should create embedding_cache/cache.json. The second run should reuse that file and avoid another API call for the same exact text and model.

if __name__ == "__main__":
    result1 = crew.kickoff()
    print("\nFirst run:")
    print(result1)

    result2 = crew.kickoff()
    print("\nSecond run:")
    print(result2)

Testing It

Run the script once and watch for a new embedding_cache/cache.json file. Then run it again with no code changes; if your caching is working, the second execution should be faster and should not generate a fresh embedding request for the same input.

If you want to be more explicit, add a temporary print("cache hit") inside get_embedding() before returning from cache. You can also change one word in the input text and confirm that it creates a new cache entry because the hash changes.

For a quick sanity check, delete embedding_cache/cache.json and rerun. The first call should rebuild the cache cleanly.

Next Steps

•Add TTL-based expiration so old embeddings refresh automatically.
•Swap JSON storage for SQLite or Redis once your cache grows.
•Extend this pattern to document chunking so you cache per chunk instead of per full document.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit