CrewAI Tutorial (Python): mocking LLM calls in tests for advanced developers

By Cyprian AaronsUpdated 2026-04-21
crewaimocking-llm-calls-in-tests-for-advanced-developerspython

This tutorial shows you how to make CrewAI-based Python tests deterministic by mocking LLM calls at the boundary where agents would normally hit the model provider. You need this when your agent logic is correct but your tests are flaky, slow, or expensive because they depend on live model responses.

What You'll Need

  • Python 3.10+
  • crewai
  • pytest
  • unittest.mock from the standard library
  • An API key only if you plan to run the real LLM path locally
  • A small CrewAI project with at least one Agent, Task, and Crew

Step-by-Step

  1. Start with a minimal CrewAI setup that uses a real LLM in production code, but keep the model behind a constructor argument so tests can swap it out. The key is not to mock your business logic; mock the LLM boundary.
from crewai import Agent, Task, Crew, Process
from crewai.llm import LLM


def build_crew(llm: LLM | None = None) -> Crew:
    llm = llm or LLM(model="gpt-4o-mini")

    analyst = Agent(
        role="Financial Analyst",
        goal="Summarize customer account activity",
        backstory="You review transaction patterns for banking ops.",
        llm=llm,
        verbose=False,
    )

    task = Task(
        description="Summarize the account activity in one paragraph.",
        expected_output="A concise summary of activity.",
        agent=analyst,
    )

    return Crew(agents=[analyst], tasks=[task], process=Process.sequential)
  1. Create a fake LLM that returns a fixed response. This keeps your test fast and avoids network calls while still exercising CrewAI’s orchestration code.
from crewai.llm import LLM


class FakeLLM(LLM):
    def __init__(self, response: str):
        super().__init__(model="fake-model")
        self._response = response
        self.calls = []

    def call(self, messages=None, tools=None, callbacks=None, available_functions=None):
        self.calls.append(messages)
        return self._response
  1. Write the production-style test by injecting the fake into your crew builder. You want to assert both the output and that the model boundary was actually hit.
from app import build_crew
from tests.fakes import FakeLLM


def test_crew_uses_mocked_llm():
    fake_llm = FakeLLM("Customer activity is stable with no unusual withdrawals.")
    crew = build_crew(llm=fake_llm)

    result = crew.kickoff()

    assert "stable" in str(result)
    assert len(fake_llm.calls) > 0
  1. If your existing code constructs agents internally and you cannot inject dependencies yet, patch the LLM class used by that module. This is less clean than dependency injection, but it works for legacy code.
from unittest.mock import patch

from app import build_crew


def test_crew_with_patched_llm():
    with patch("app.LLM") as mock_llm_cls:
        mock_llm = mock_llm_cls.return_value
        mock_llm.call.return_value = "Mocked summary for testing."

        crew = build_crew()
        result = crew.kickoff()

        assert "Mocked summary" in str(result)
        mock_llm.call.assert_called()
  1. For higher-confidence tests, split them into unit and integration layers. Unit tests use the fake or patching; integration tests run against a real provider behind an environment flag so you can catch prompt regressions without making every CI run expensive.
import os
import pytest

from app import build_crew


@pytest.mark.integration
def test_real_llm_path():
    if os.getenv("RUN_LIVE_LLM_TESTS") != "1":
        pytest.skip("Live LLM tests disabled")

    crew = build_crew()
    result = crew.kickoff()

    assert result is not None

Testing It

Run your unit test suite first with live LLM tests disabled. You should see deterministic output every time, no API usage spikes, and no random failures caused by provider latency or rate limits.

Then run the patched test and confirm that your mocked call() method is actually invoked. If you are using dependency injection correctly, this should work without touching any environment variables or secret keys.

Finally, enable the live integration test only when needed by setting RUN_LIVE_LLM_TESTS=1. That gives you a clean separation between fast CI coverage and occasional end-to-end validation.

Next Steps

  • Add assertions on structured outputs by forcing JSON schema-like responses from your fake LLM.
  • Mock tool calls separately from model calls so you can isolate agent reasoning from external side effects.
  • Wrap this pattern into reusable pytest fixtures for all agent crews in your codebase.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides