AutoGen Tutorial (Python): mocking LLM calls in tests for advanced developers
This tutorial shows how to replace real LLM calls in AutoGen tests with deterministic fakes so your test suite stays fast, offline, and stable. You need this when you want to verify agent logic, message routing, tool execution, and retries without paying for tokens or depending on model nondeterminism.
What You'll Need
- •Python 3.10+
- •
pyautogeninstalled - •
pytestinstalled - •No OpenAI API key required for the mocked tests
- •Optional: an OpenAI API key if you want to compare mocked behavior with a real run later
Install the dependencies:
pip install pyautogen pytest
Step-by-Step
- •Create a small AutoGen setup that uses a custom model client interface.
The trick is to keep your production agent code intact and swap only the model client in tests.
from autogen import AssistantAgent, UserProxyAgent
assistant = AssistantAgent(
name="assistant",
llm_config={
"config_list": [],
"timeout": 30,
"temperature": 0,
},
)
user = UserProxyAgent(
name="user",
human_input_mode="NEVER",
code_execution_config=False,
)
- •Build a fake LLM client that returns fixed responses.
This lets you test the orchestration layer without calling any external API.
from types import SimpleNamespace
class FakeLLMClient:
def __init__(self, replies):
self.replies = list(replies)
self.calls = []
def create(self, messages, **kwargs):
self.calls.append({"messages": messages, "kwargs": kwargs})
content = self.replies.pop(0) if self.replies else "default mocked reply"
return SimpleNamespace(
choices=[SimpleNamespace(message=SimpleNamespace(content=content))]
)
- •Wire the fake client into an AutoGen agent for tests.
In AutoGen, the model client is what ultimately produces completions, so replacing it gives you deterministic behavior.
from autogen import AssistantAgent
fake_client = FakeLLMClient(["Mocked answer from the assistant"])
assistant = AssistantAgent(
name="assistant",
llm_config={"config_list": [], "temperature": 0},
)
assistant.client = fake_client
result = assistant.generate_reply(messages=[{"role": "user", "content": "Say hello"}])
print(result)
print(fake_client.calls[0]["messages"][-1]["content"])
- •Use
pytestto assert both the returned text and the request payload.
Good tests check not just the output but also that your agent sent the right prompt structure.
# test_agent_mock.py
from types import SimpleNamespace
from autogen import AssistantAgent
class FakeLLMClient:
def __init__(self, replies):
self.replies = list(replies)
self.calls = []
def create(self, messages, **kwargs):
self.calls.append({"messages": messages, "kwargs": kwargs})
content = self.replies.pop(0)
return SimpleNamespace(
choices=[SimpleNamespace(message=SimpleNamespace(content=content))]
)
def test_assistant_uses_mocked_llm():
fake_client = FakeLLMClient(["Approved"])
assistant = AssistantAgent(name="assistant", llm_config={"config_list": []})
assistant.client = fake_client
reply = assistant.generate_reply(messages=[{"role": "user", "content": "Approve claim"}])
assert reply == "Approved"
assert fake_client.calls[0]["messages"][-1]["content"] == "Approve claim"
- •Mock multi-turn behavior when your agent logic depends on sequence.
This is useful for workflows like validation followed by correction or escalation.
from types import SimpleNamespace
from autogen import AssistantAgent
class FakeLLMClient:
def __init__(self, replies):
self.replies = list(replies)
def create(self, messages, **kwargs):
content = self.replies.pop(0)
return SimpleNamespace(
choices=[SimpleNamespace(message=SimpleNamespace(content=content))]
)
fake_client = FakeLLMClient([
"I need more information.",
"Final answer: proceed.",
])
assistant = AssistantAgent(name="assistant", llm_config={"config_list": []})
assistant.client = fake_client
first = assistant.generate_reply(messages=[{"role": "user", "content": "Process request"}])
second = assistant.generate_reply(messages=[{"role": "user", "content": "Here are the details"}])
print(first)
print(second)
Testing It
Run pytest -q and confirm the test passes without any network access. If your setup is correct, the assertions should verify both the exact response string and the prompt sent into create(). For a stronger check, add one test that exhausts the fake reply list and confirm your fallback behavior is explicit rather than accidental.
If you see real API traffic during tests, your agent is still using a live config somewhere in your stack. That usually means you patched the wrong object or instantiated a second agent outside the test boundary.
Next Steps
- •Add mocks for tool calls so you can test function-calling flows end to end.
- •Wrap this pattern in
pytestfixtures so every test gets a clean fake client. - •Compare this approach with AutoGen’s built-in caching if you also want replayable integration tests.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit