CrewAI Tutorial (Python): mocking LLM calls in tests for beginners

By Cyprian AaronsUpdated 2026-04-21
crewaimocking-llm-calls-in-tests-for-beginnerspython

This tutorial shows you how to test a CrewAI workflow without calling a real LLM. You’ll replace network-bound model calls with mocks so your tests run fast, deterministically, and without API costs.

What You'll Need

  • Python 3.10 or newer
  • crewai
  • pytest
  • unittest.mock from the Python standard library
  • Optional: an OpenAI API key if you want to run the agent against a real model outside tests
  • A basic CrewAI setup with at least one Agent, one Task, and one Crew

Install the packages:

pip install crewai pytest

Step-by-Step

  1. Start with a small CrewAI project that uses an LLM-backed agent. Keep the task simple so the test focuses on mocking, not prompt design.
# app.py
from crewai import Agent, Task, Crew, Process

researcher = Agent(
    role="Researcher",
    goal="Answer questions clearly",
    backstory="You are good at concise technical explanations.",
    verbose=False,
)

task = Task(
    description="Explain what pytest is in one sentence.",
    expected_output="A one-sentence explanation of pytest.",
    agent=researcher,
)

crew = Crew(
    agents=[researcher],
    tasks=[task],
    process=Process.sequential,
)

if __name__ == "__main__":
    result = crew.kickoff()
    print(result)
  1. In tests, mock the method that actually triggers the LLM call. For beginner-friendly unit tests, patch Crew.kickoff and return a fake result object or string that matches what your code expects.
# test_app.py
from unittest.mock import patch

from app import crew

def test_crew_kickoff_is_mocked():
    with patch("app.Crew.kickoff", return_value="pytest is a testing framework for Python") as mocked_kickoff:
        result = crew.kickoff()

    assert result == "pytest is a testing framework for Python"
    mocked_kickoff.assert_called_once()
  1. If your application wraps CrewAI in a function, test that wrapper instead of the Crew object directly. This keeps your tests stable when you later change agents or tasks internally.
# app.py
from crewai import Agent, Task, Crew, Process

def build_crew():
    researcher = Agent(
        role="Researcher",
        goal="Answer questions clearly",
        backstory="You are good at concise technical explanations.",
        verbose=False,
    )

    task = Task(
        description="Explain what pytest is in one sentence.",
        expected_output="A one-sentence explanation of pytest.",
        agent=researcher,
    )

    return Crew(
        agents=[researcher],
        tasks=[task],
        process=Process.sequential,
    )

def run_crew():
    return build_crew().kickoff()
# test_app.py
from unittest.mock import patch
from app import run_crew

def test_run_crew_returns_mocked_output():
    with patch("app.Crew.kickoff", return_value="mocked crew output"):
        result = run_crew()

    assert result == "mocked crew output"
  1. When you need more control, mock lower-level LLM behavior instead of the whole crew. This is useful if you want to verify prompts, task routing, or downstream parsing while still avoiding external calls.
# test_prompt_flow.py
from unittest.mock import patch
from app import build_crew

def test_build_crew_and_mock_execution():
    fake_output = "CrewAI is a framework for orchestrating AI agents."

    with patch("crewai.Crew.kickoff", return_value=fake_output):
        crew = build_crew()
        result = crew.kickoff()

    assert "framework" in result
  1. Add one assertion that checks your code handles the mocked response correctly. The point of mocking is not just avoiding API calls; it’s verifying your business logic around the response.
# parser.py
def summarize_answer(answer: str) -> str:
    if "framework" in answer.lower():
        return "valid"
    return "invalid"
# test_parser.py
from unittest.mock import patch

from app import run_crew
from parser import summarize_answer

def test_summary_logic_with_mocked_llm_output():
    with patch("app.Crew.kickoff", return_value="CrewAI is a framework for agents"):
        answer = run_crew()

    assert summarize_answer(answer) == "valid"

Testing It

Run your tests with pytest. If everything is wired correctly, the suite should pass without any network access and without needing an API key.

The important signal here is that your tests are deterministic: the same mocked output always produces the same assertion result. If a test starts failing after this setup, it usually means your wrapper logic changed or you patched the wrong import path.

A common mistake is patching crewai.Crew.kickoff when your code imported Crew into another module. Patch where the object is used, not where it originally comes from.

Next Steps

  • Learn how to mock different outputs per test case using side_effect
  • Add integration tests that hit a real model behind an environment flag like RUN_LIVE_LLM_TESTS=1
  • Test structured outputs by mocking JSON-like responses and validating them with Pydantic

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides