AutoGen Tutorial (Python): mocking LLM calls in tests for intermediate developers
This tutorial shows how to replace real LLM calls with deterministic mocks in AutoGen tests, so your unit tests run fast, offline, and without flaky model output. You need this when you want to test agent logic, message flow, and tool execution without paying for API calls or depending on network availability.
What You'll Need
- •Python 3.10+
- •
pyautogeninstalled - •
pytestinstalled - •Optional: an OpenAI API key if you want to run the same agent outside tests
- •A basic AutoGen setup with
AssistantAgentandUserProxyAgent
Install the packages:
pip install pyautogen pytest
Step-by-Step
- •Start with a small agent setup that you can exercise in a test. The key is to keep the agent code separate from the mock so you can swap the model client during testing.
# app.py
from autogen import AssistantAgent, UserProxyAgent
def build_agents(llm_config):
assistant = AssistantAgent(
name="assistant",
llm_config=llm_config,
)
user = UserProxyAgent(
name="user",
human_input_mode="NEVER",
code_execution_config=False,
)
return assistant, user
- •Create a fake model client that returns a fixed response. AutoGen will call this object instead of making a real LLM request, which gives you stable outputs for assertions.
# fake_client.py
class FakeChatCompletionClient:
def create(self, messages, **kwargs):
return {
"choices": [
{
"message": {
"role": "assistant",
"content": "Mocked response from test",
}
}
]
}
- •Wire the fake client into your agents through
llm_config. In tests, point AutoGen at the fake object; in production, use your normal OpenAI config.
# test_agent.py
from app import build_agents
from fake_client import FakeChatCompletionClient
def test_assistant_uses_mocked_llm():
llm_config = {
"config_list": [{"model": "gpt-4o-mini"}],
"temperature": 0,
"client": FakeChatCompletionClient(),
}
assistant, user = build_agents(llm_config)
result = user.initiate_chat(assistant, message="Say hello")
assert result.chat_history[-1]["content"] == "Mocked response from test"
- •If you need to mock multiple turns, return different responses based on call count or input content. This is useful when testing branching logic like retries, tool use, or follow-up prompts.
# fake_client.py
class FakeChatCompletionClient:
def __init__(self):
self.calls = 0
def create(self, messages, **kwargs):
self.calls += 1
if self.calls == 1:
content = "I need more information."
else:
content = "Final answer after clarification."
return {
"choices": [
{
"message": {
"role": "assistant",
"content": content,
}
}
]
}
- •Add one more assertion around call count so you verify the agent used the mock exactly as expected. That catches accidental real API calls and makes your tests stricter.
# test_agent.py
from app import build_agents
from fake_client import FakeChatCompletionClient
def test_mock_is_used_for_all_llm_calls():
client = FakeChatCompletionClient()
llm_config = {
"config_list": [{"model": "gpt-4o-mini"}],
"temperature": 0,
"client": client,
}
assistant, user = build_agents(llm_config)
result = user.initiate_chat(assistant, message="Need help")
assert client.calls >= 1
assert result.chat_history[-1]["content"] in {
"I need more information.",
"Final answer after clarification.",
}
- •If you prefer patching instead of dependency injection, monkeypatch AutoGen’s internal model call in your test suite. This is handy when you cannot easily pass a custom client through existing application code.
# test_patch.py
import autogen.oai.client as oai_client
def fake_create(*args, **kwargs):
return {
"choices": [
{
"message": {
"role": "assistant",
"content": "Patched response",
}
}
]
}
def test_with_monkeypatch(monkeypatch):
monkeypatch.setattr(oai_client.OpenAIWrapper, "_create", fake_create)
Testing It
Run pytest -q and confirm your tests pass without setting any API key. If a test hangs or tries to hit the network, your mock is not wired correctly and you are still using a real model path somewhere.
For extra confidence, temporarily unset OPENAI_API_KEY before running tests. A good mocked setup should still pass because it never depends on live inference.
If you want to inspect behavior manually, print result.chat_history inside a local script and verify the last assistant message matches your fake response exactly.
Next Steps
- •Mock tool execution alongside LLM calls so you can unit test full agent workflows.
- •Move from simple fakes to fixture-based responses for multi-turn conversation testing.
- •Add contract tests that compare your mocked outputs against known production prompts and schemas.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit