AutoGen Tutorial (Python): mocking LLM calls in tests for beginners

By Cyprian AaronsUpdated 2026-04-21
autogenmocking-llm-calls-in-tests-for-beginnerspython

This tutorial shows you how to replace real LLM calls with deterministic mocks in AutoGen tests. You need this when you want fast, repeatable unit tests that do not burn API credits, fail on network issues, or change behavior because the model responded differently.

What You'll Need

  • Python 3.10+
  • autogen-agentchat installed
  • pytest installed
  • Optional: an OpenAI API key if you want to compare mocked tests with real calls later
  • Basic familiarity with AutoGen agents and messages

Install the packages:

pip install autogen-agentchat pytest

Step-by-Step

  1. Start with a small agent setup that uses a real model client in production code. Keep the agent construction separate from the test so you can swap the client later without rewriting your app.
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

def build_agent() -> AssistantAgent:
    client = OpenAIChatCompletionClient(
        model="gpt-4o-mini",
    )
    return AssistantAgent(
        name="support_agent",
        model_client=client,
        system_message="You are a concise support assistant.",
    )
  1. Put your business logic behind a function that accepts an agent as a dependency. This is the part you test, not the model provider itself.
import asyncio
from autogen_agentchat.messages import TextMessage

async def get_reply(agent: AssistantAgent, user_text: str) -> str:
    result = await agent.on_messages(
        [TextMessage(content=user_text, source="user")],
        cancellation_token=None,
    )
    return result.chat_message.content

if __name__ == "__main__":
    agent = build_agent()
    print(asyncio.run(get_reply(agent, "What is our refund policy?")))
  1. Create a fake model client for tests by implementing the same async interface AutoGen expects. The key idea is simple: return fixed text based on the input instead of calling an API.
from autogen_core.models import CreateResult, RequestUsage, ModelInfo
from autogen_core.models._types import LLMMessage
from autogen_ext.models.base import ChatCompletionClientBase

class MockChatCompletionClient(ChatCompletionClientBase):
    def __init__(self):
        self.calls = []

    @property
    def model_info(self) -> ModelInfo:
        return ModelInfo(
            vision=False,
            function_calling=False,
            json_output=False,
            family="mock",
        )

    async def create(self, messages: list[LLMMessage], **kwargs) -> CreateResult:
        self.calls.append(messages)
        return CreateResult(
            finish_reason="stop",
            content="Mocked response: refund policy is 30 days.",
            usage=RequestUsage(prompt_tokens=12, completion_tokens=8),
        )

    async def close(self) -> None:
        return None
  1. Write a test that injects the mock client into the agent. This gives you a deterministic assertion on the returned text and also lets you verify whether the agent called the model at all.
import pytest

@pytest.mark.asyncio
async def test_get_reply_uses_mocked_llm():
    mock_client = MockChatCompletionClient()
    agent = AssistantAgent(
        name="support_agent",
        model_client=mock_client,
        system_message="You are a concise support assistant.",
    )

    reply = await get_reply(agent, "What is our refund policy?")

    assert reply == "Mocked response: refund policy is 30 days."
    assert len(mock_client.calls) == 1
  1. If you prefer not to mock at the client layer, patch your own wrapper function instead of AutoGen internals. This is cleaner when your code already isolates LLM access behind one function.
from unittest.mock import AsyncMock

async def generate_answer(agent: AssistantAgent, question: str) -> str:
    return await get_reply(agent, question)

@pytest.mark.asyncio
async def test_generate_answer_with_patch(monkeypatch):
    fake_get_reply = AsyncMock(return_value="patched answer")

    monkeypatch.setattr(__name__, "get_reply", fake_get_reply)

    dummy_agent = object()
    answer = await generate_answer(dummy_agent, "Hello")

    assert answer == "patched answer"
    fake_get_reply.assert_awaited_once()

Testing It

Run your tests with pytest -q. The important signal is that they pass without any network access and without an API key in your test environment.

If you want to confirm isolation, temporarily disconnect from the internet or unset your OpenAI credentials and rerun the suite. A good mocked test still passes because nothing reaches out to a real model.

Also check that your assertions are about behavior, not exact token-by-token prompts unless that prompt shape is part of your contract. In practice, assert on returned content and call count first.

Next Steps

  • Add multiple mock responses so you can test branching logic like retries, refusals, and tool calls.
  • Move from unit tests to contract tests that validate your prompt format before sending anything to production.
  • Learn how to mock streamed responses next if your app uses token streaming in AutoGen.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides