AutoGen Tutorial (Python): testing agents locally for intermediate developers
This tutorial shows you how to run and test AutoGen agents locally in Python without wiring up a full production backend. You’ll build a small multi-agent setup, mock the LLM during tests, and verify agent behavior with deterministic local checks.
What You'll Need
- •Python 3.10+
- •
autogen-agentchat - •
autogen-ext - •
pytest - •An OpenAI-compatible API key if you want to run against a real model
- •Optional:
python-dotenvfor local environment variables
Install the packages:
pip install autogen-agentchat autogen-ext pytest python-dotenv
Set your API key if you plan to use a live model:
export OPENAI_API_KEY="your-key-here"
Step-by-Step
- •Start with a minimal agent setup that can run locally. For testing, keep the agent configuration small and explicit so failures are easy to trace.
import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
async def main() -> None:
model_client = OpenAIChatCompletionClient(
model="gpt-4o-mini",
)
agent = AssistantAgent(
name="support_agent",
model_client=model_client,
system_message="You are a concise support assistant.",
)
result = await agent.run(task="Write one sentence explaining what AutoGen is.")
print(result.messages[-1].content)
if __name__ == "__main__":
asyncio.run(main())
- •Add a second agent so you can test multi-agent coordination locally. In practice, this is where most integration bugs show up: handoffs, role confusion, and unexpected message formats.
import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
async def main() -> None:
client = OpenAIChatCompletionClient(model="gpt-4o-mini")
planner = AssistantAgent(
name="planner",
model_client=client,
system_message="Create short plans only.",
)
writer = AssistantAgent(
name="writer",
model_client=client,
system_message="Turn plans into clear customer-facing text.",
)
plan_result = await planner.run(task="Plan a response for a user asking about card charges.")
draft = plan_result.messages[-1].content
write_result = await writer.run(task=f"Rewrite this for the customer: {draft}")
print(write_result.messages[-1].content)
if __name__ == "__main__":
asyncio.run(main())
- •For local testing, don’t depend on live model calls. Use a fake model client so your tests stay deterministic and fast.
import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_core.models import UserMessage
from autogen_ext.models.replay import ReplayChatCompletionClient
async def main() -> None:
client = ReplayChatCompletionClient(
[
"AutoGen is a framework for building LLM-powered agents.",
"Here is the customer-facing version of that explanation.",
]
)
agent = AssistantAgent(
name="test_agent",
model_client=client,
system_message="Answer briefly.",
)
result = await agent.run(task="Explain AutoGen.")
for message in result.messages:
print(type(message).__name__, "=>", getattr(message, "content", ""))
if __name__ == "__main__":
asyncio.run(main())
- •Wrap that behavior in a pytest test. This gives you a repeatable check that your agent still produces the expected output after prompt or code changes.
import pytest
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.replay import ReplayChatCompletionClient
@pytest.mark.asyncio
async def test_assistant_agent_returns_expected_text() -> None:
client = ReplayChatCompletionClient(["AutoGen helps build agents."])
agent = AssistantAgent(
name="test_agent",
model_client=client,
system_message="Be direct.",
)
result = await agent.run(task="What does AutoGen do?")
assert "AutoGen helps build agents." in result.messages[-1].content
- •If you want stronger local validation, test message flow instead of just final text. That catches regressions where an agent still answers, but stops using the right intermediate format.
import pytest
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.replay import ReplayChatCompletionClient
@pytest.mark.asyncio
async def test_two_step_flow() -> None:
client = ReplayChatCompletionClient([
"Step 1: identify the issue.",
"Step 2: ask for transaction date and amount.",
])
agent = AssistantAgent(
name="support_agent",
model_client=client,
system_message="Return short support guidance.",
)
result = await agent.run(task="Help with an unknown card charge.")
contents = [getattr(msg, "content", "") for msg in result.messages]
assert any("identify the issue" in c.lower() for c in contents)
Testing It
Run your tests with pytest -q. If you used ReplayChatCompletionClient, the tests should pass without calling any external API, which makes them suitable for CI and local development.
If you want to verify the live path, switch back to OpenAIChatCompletionClient and run the script directly. Check that your API key is set and that the model name matches what your account can access.
A good sanity check is to intentionally change one expected string in a test and confirm it fails. That tells you your assertions are actually protecting behavior instead of just executing code.
Next Steps
- •Add tool calls to your agents and write tests around tool output, not just natural language responses.
- •Move from single-agent tests to group chat tests so you can validate routing and handoff behavior.
- •Store replay fixtures from real conversations and use them as regression cases when prompts change.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit