CrewAI Tutorial (Python): testing agents locally for advanced developers
This tutorial shows you how to run CrewAI agents locally, swap live LLM calls for deterministic test doubles, and verify agent behavior without burning API credits. You need this when you want fast feedback on prompts, tools, and task wiring before pushing anything into a shared environment.
What You'll Need
- •Python 3.10+
- •
crewai - •
pytest - •
python-dotenv - •An OpenAI-compatible API key if you want to run the same crew against a real model later
- •A local project folder with write access
- •Basic familiarity with
Agent,Task, andCrew
Install the dependencies:
pip install crewai pytest python-dotenv
Step-by-Step
- •Create a small project layout and keep your agent code isolated from your tests. That makes it easy to swap real model calls for local fakes without touching production code.
crew-local-test/
├── app.py
├── test_app.py
└── .env
- •Define your crew in a way that accepts an injected LLM. For local testing, use a stubbed LLM that returns predictable output; for real runs, replace it with your provider config.
# app.py
from crewai import Agent, Task, Crew, Process
def build_crew(llm):
analyst = Agent(
role="Claims Analyst",
goal="Summarize claim notes clearly",
backstory="You review insurance claims and produce concise summaries.",
llm=llm,
verbose=False,
)
task = Task(
description="Summarize the following claim note in 2 bullet points: Customer reported water damage.",
expected_output="Two bullet points summarizing the note.",
agent=analyst,
)
return Crew(
agents=[analyst],
tasks=[task],
process=Process.sequential,
verbose=False,
)
- •Build a local fake LLM for deterministic tests. CrewAI only needs an object with a callable interface in many setups, so we can return fixed text and validate the downstream behavior.
# test_app.py
from app import build_crew
class FakeLLM:
def __call__(self, prompt, **kwargs):
return "- Water damage was reported by the customer.\n- The claim requires assessment for cause and scope."
def test_crew_runs_locally():
crew = build_crew(FakeLLM())
result = crew.kickoff()
text = str(result)
assert "Water damage" in text
assert "assessment" in text
- •Add a real-model path for manual local verification. This lets you compare the fake output against an actual provider when you want to test prompt quality or tool behavior.
# app.py
import os
from dotenv import load_dotenv
load_dotenv()
class OpenAIConfig:
def __init__(self):
self.api_key = os.getenv("OPENAI_API_KEY")
if __name__ == "__main__":
from crewai import LLM
llm = LLM(
model="gpt-4o-mini",
api_key=os.getenv("OPENAI_API_KEY"),
)
crew = build_crew(llm)
result = crew.kickoff()
print(result)
- •Run the tests locally and keep them fast. The point is to catch broken task definitions, prompt regressions, and bad assumptions before you hit the network.
pytest -q
- •If you want stronger checks, assert on structure instead of just substrings. For advanced teams, that usually means validating formatting, length limits, or JSON-shaped output from tasks.
# test_app.py
from app import build_crew
class FakeLLM:
def __call__(self, prompt, **kwargs):
return "- Water damage was reported.\n- Needs adjuster review."
def test_output_has_two_bullets():
crew = build_crew(FakeLLM())
result = str(crew.kickoff())
lines = [line for line in result.splitlines() if line.strip()]
assert len(lines) == 2
assert all(line.startswith("- ") for line in lines)
Testing It
Start with pytest -q and make sure the fake LLM test passes consistently. Then run python app.py with a valid OPENAI_API_KEY in .env to confirm the same crew works against a live model.
If the test passes but the live run fails, the issue is usually in provider config or prompt expectations, not your Python wiring. If both pass but output quality is poor, tighten the task description and add assertions around format or content.
For local debugging, print repr(result) or cast it to str(result) so you can inspect exactly what CrewAI returned. That matters because many failures are not exceptions; they’re just low-quality outputs that still satisfy execution.
Next Steps
- •Add tool testing with mocked HTTP clients so your agents can call internal services without hitting real endpoints.
- •Move from string assertions to schema validation using Pydantic or JSON parsing for structured outputs.
- •Split crews into reusable fixtures so multiple tests can reuse the same agent setup with different fake responses.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit