CrewAI Tutorial (Python): mocking LLM calls in tests for beginners
This tutorial shows you how to test a CrewAI workflow without calling a real LLM. You’ll replace network-bound model calls with mocks so your tests run fast, deterministically, and without API costs.
What You'll Need
- •Python 3.10 or newer
- •
crewai - •
pytest - •
unittest.mockfrom the Python standard library - •Optional: an OpenAI API key if you want to run the agent against a real model outside tests
- •A basic CrewAI setup with at least one
Agent, oneTask, and oneCrew
Install the packages:
pip install crewai pytest
Step-by-Step
- •Start with a small CrewAI project that uses an LLM-backed agent. Keep the task simple so the test focuses on mocking, not prompt design.
# app.py
from crewai import Agent, Task, Crew, Process
researcher = Agent(
role="Researcher",
goal="Answer questions clearly",
backstory="You are good at concise technical explanations.",
verbose=False,
)
task = Task(
description="Explain what pytest is in one sentence.",
expected_output="A one-sentence explanation of pytest.",
agent=researcher,
)
crew = Crew(
agents=[researcher],
tasks=[task],
process=Process.sequential,
)
if __name__ == "__main__":
result = crew.kickoff()
print(result)
- •In tests, mock the method that actually triggers the LLM call. For beginner-friendly unit tests, patch
Crew.kickoffand return a fake result object or string that matches what your code expects.
# test_app.py
from unittest.mock import patch
from app import crew
def test_crew_kickoff_is_mocked():
with patch("app.Crew.kickoff", return_value="pytest is a testing framework for Python") as mocked_kickoff:
result = crew.kickoff()
assert result == "pytest is a testing framework for Python"
mocked_kickoff.assert_called_once()
- •If your application wraps CrewAI in a function, test that wrapper instead of the Crew object directly. This keeps your tests stable when you later change agents or tasks internally.
# app.py
from crewai import Agent, Task, Crew, Process
def build_crew():
researcher = Agent(
role="Researcher",
goal="Answer questions clearly",
backstory="You are good at concise technical explanations.",
verbose=False,
)
task = Task(
description="Explain what pytest is in one sentence.",
expected_output="A one-sentence explanation of pytest.",
agent=researcher,
)
return Crew(
agents=[researcher],
tasks=[task],
process=Process.sequential,
)
def run_crew():
return build_crew().kickoff()
# test_app.py
from unittest.mock import patch
from app import run_crew
def test_run_crew_returns_mocked_output():
with patch("app.Crew.kickoff", return_value="mocked crew output"):
result = run_crew()
assert result == "mocked crew output"
- •When you need more control, mock lower-level LLM behavior instead of the whole crew. This is useful if you want to verify prompts, task routing, or downstream parsing while still avoiding external calls.
# test_prompt_flow.py
from unittest.mock import patch
from app import build_crew
def test_build_crew_and_mock_execution():
fake_output = "CrewAI is a framework for orchestrating AI agents."
with patch("crewai.Crew.kickoff", return_value=fake_output):
crew = build_crew()
result = crew.kickoff()
assert "framework" in result
- •Add one assertion that checks your code handles the mocked response correctly. The point of mocking is not just avoiding API calls; it’s verifying your business logic around the response.
# parser.py
def summarize_answer(answer: str) -> str:
if "framework" in answer.lower():
return "valid"
return "invalid"
# test_parser.py
from unittest.mock import patch
from app import run_crew
from parser import summarize_answer
def test_summary_logic_with_mocked_llm_output():
with patch("app.Crew.kickoff", return_value="CrewAI is a framework for agents"):
answer = run_crew()
assert summarize_answer(answer) == "valid"
Testing It
Run your tests with pytest. If everything is wired correctly, the suite should pass without any network access and without needing an API key.
The important signal here is that your tests are deterministic: the same mocked output always produces the same assertion result. If a test starts failing after this setup, it usually means your wrapper logic changed or you patched the wrong import path.
A common mistake is patching crewai.Crew.kickoff when your code imported Crew into another module. Patch where the object is used, not where it originally comes from.
Next Steps
- •Learn how to mock different outputs per test case using
side_effect - •Add integration tests that hit a real model behind an environment flag like
RUN_LIVE_LLM_TESTS=1 - •Test structured outputs by mocking JSON-like responses and validating them with Pydantic
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit