CrewAI Tutorial (Python): mocking LLM calls in tests for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
crewaimocking-llm-calls-in-tests-for-intermediate-developerspython

This tutorial shows you how to test CrewAI code without calling a real LLM. You’ll mock the model boundary so your unit tests stay fast, deterministic, and cheap.

What You'll Need

  • Python 3.10+
  • crewai
  • pytest
  • unittest.mock from the Python standard library
  • An OpenAI API key only if you want to run the agent against a real model outside tests
  • A basic CrewAI project with at least one Agent, one Task, and one Crew

Step-by-Step

  1. Start with a small CrewAI module that is easy to test. Keep the agent, task, and crew construction in one file so your test can import it directly.
# crew_app.py
from crewai import Agent, Task, Crew, Process

researcher = Agent(
    role="Researcher",
    goal="Summarize customer complaints",
    backstory="You analyze support tickets for banking teams.",
    verbose=False,
)

complaint_task = Task(
    description="Summarize the main complaint in one sentence: {ticket}",
    expected_output="A single sentence summary",
    agent=researcher,
)

def build_crew(ticket: str) -> Crew:
    return Crew(
        agents=[researcher],
        tasks=[complaint_task],
        process=Process.sequential,
        verbose=False,
    )
  1. Test the behavior at the boundary where CrewAI would normally call an LLM. The cleanest approach is to patch the underlying kickoff call or the agent’s execution path, depending on what you want to isolate.
# test_crew_app.py
from unittest.mock import patch
from crew_app import build_crew

def test_build_crew_returns_crew():
    crew = build_crew("Card was charged twice")
    assert crew is not None

@patch("crewai.Crew.kickoff")
def test_crew_kickoff_is_mocked(mock_kickoff):
    mock_kickoff.return_value = "Customer reports duplicate charge."
    crew = build_crew("Card was charged twice")

    result = crew.kickoff(inputs={"ticket": "Card was charged twice"})

    assert result == "Customer reports duplicate charge."
    mock_kickoff.assert_called_once()
  1. If you want a more realistic unit test, patch the model client instead of the whole crew. This keeps your task wiring intact while preventing any network call.
# test_agent_boundary.py
from unittest.mock import patch, MagicMock
from crew_app import build_crew

@patch("crewai.llm.LLM.call")
def test_llm_call_is_mocked(mock_call):
    mock_call.return_value = MagicMock(
        text="The customer says they were charged twice."
    )

    crew = build_crew("Card was charged twice")
    result = crew.kickoff(inputs={"ticket": "Card was charged twice"})

    assert "charged twice" in str(result).lower()
  1. For logic that depends on your own wrapper functions, mock the wrapper rather than CrewAI internals. That gives you stable tests even if CrewAI changes implementation details later.
# app_service.py
from crew_app import build_crew

def summarize_ticket(ticket: str) -> str:
    crew = build_crew(ticket)
    return str(crew.kickoff(inputs={"ticket": ticket}))
# test_app_service.py
from unittest.mock import patch
from app_service import summarize_ticket

@patch("app_service.build_crew")
def test_summarize_ticket(mock_build_crew):
    fake_crew = mock_build_crew.return_value
    fake_crew.kickoff.return_value = "Duplicate charge detected."

    result = summarize_ticket("Card was charged twice")

    assert result == "Duplicate charge detected."
  1. Use fixtures when you need multiple mocked responses across several tests. This keeps your suite readable once you start testing retries, branching logic, or multiple tasks.
# test_fixtures.py
import pytest
from unittest.mock import Mock

@pytest.fixture
def fake_crew():
    crew = Mock()
    crew.kickoff.return_value = "Refund requested."
    return crew

def test_refund_flow(fake_crew):
    result = fake_crew.kickoff(inputs={"ticket": "Please refund me"})
    assert result == "Refund requested."

Testing It

Run your tests with pytest -q. If everything is wired correctly, the suite should pass without requiring an API key or making outbound requests.

The main thing to verify is that no test hits a real model endpoint. If a test becomes slow or flaky, you probably missed a patch target and are still calling into CrewAI’s execution path.

A good sanity check is to temporarily disconnect your network or unset your LLM credentials and rerun the suite. Properly mocked tests will still pass.

Next Steps

  • Add assertions on task inputs and output formatting so you validate behavior, not just return values.
  • Learn how to use pytest fixtures and parametrization for testing multiple ticket types.
  • Move from unit tests to contract tests for your CrewAI wrappers so refactors don’t break production behavior unexpectedly.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides