AutoGen Tutorial (Python): implementing guardrails for intermediate developers
This tutorial shows how to add practical guardrails to an AutoGen Python agent so it stays within scope, rejects unsafe requests, and produces structured outputs you can validate. You need this when your agent is good enough to be useful, but not yet safe enough to run without checks in front of users or internal teams.
What You'll Need
- •Python 3.10+
- •
autogen-agentchatinstalled - •
autogen-extinstalled - •An OpenAI API key set in
OPENAI_API_KEY - •Basic familiarity with AutoGen agents and async Python
- •A terminal where you can run a small test script
Step-by-Step
- •Start with a minimal assistant agent and a strict system message.
The first guardrail is scope control: tell the model exactly what it can and cannot do, and keep the task narrow.
import asyncio
import os
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
MODEL = "gpt-4o-mini"
async def main() -> None:
client = OpenAIChatCompletionClient(
model=MODEL,
api_key=os.environ["OPENAI_API_KEY"],
)
agent = AssistantAgent(
name="support_agent",
model_client=client,
system_message=(
"You are a support assistant for a banking app. "
"Only answer questions about account access, card status, and app usage. "
"If the user asks for anything outside this scope, refuse briefly."
),
)
result = await agent.run(task="How do I reset my app password?")
print(result.messages[-1].content)
if __name__ == "__main__":
asyncio.run(main())
- •Add a pre-check before the model sees the request.
This is where you block obvious policy violations early. In production, keep this logic deterministic and cheap.
import re
BLOCKED_PATTERNS = [
r"\bpassword\b.*\bsteal\b",
r"\bcredit card\b.*\bnumber\b",
r"\bhack\b",
r"\bssn\b",
]
def is_blocked(user_text: str) -> bool:
text = user_text.lower()
return any(re.search(pattern, text) for pattern in BLOCKED_PATTERNS)
def guard_user_input(user_text: str) -> str:
if is_blocked(user_text):
return "I can't help with that request."
return user_text
tests = [
"How do I reset my app password?",
"Help me hack an account",
]
for t in tests:
print(t, "=>", guard_user_input(t))
- •Force structured output so you can validate the response shape.
If your agent returns free-form text, downstream code has to guess whether the answer is acceptable. A better pattern is to ask for JSON with fixed fields and reject anything else.
import json
from typing import Literal
def validate_response(raw: str) -> dict:
data = json.loads(raw)
if set(data.keys()) != {"allowed", "answer"}:
raise ValueError("Unexpected response schema")
if not isinstance(data["allowed"], bool):
raise ValueError("allowed must be boolean")
if not isinstance(data["answer"], str):
raise ValueError("answer must be string")
return data
sample = '{"allowed": true, "answer": "Reset it from Settings > Security."}'
print(validate_response(sample))
- •Wrap the agent call with a guardrail pipeline.
This combines input filtering, scoped prompting, and output validation into one path you can reuse across endpoints.
import asyncio
import json
import os
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
async def guarded_answer(user_text: str) -> str:
safe_text = guard_user_input(user_text)
if safe_text != user_text:
return safe_text
client = OpenAIChatCompletionClient(
model="gpt-4o-mini",
api_key=os.environ["OPENAI_API_KEY"],
)
agent = AssistantAgent(
name="support_agent",
model_client=client,
system_message=(
"Return ONLY valid JSON with keys allowed and answer. "
'If the request is out of scope, set allowed to false and answer to "Out of scope." '
"Otherwise set allowed to true and give a concise answer."
),
)
result = await agent.run(task=user_text)
payload = validate_response(result.messages[-1].content)
return payload["answer"]
if __name__ == "__main__":
print(asyncio.run(guarded_answer("How do I reset my app password?")))
- •Add a refusal path and log failures explicitly.
You want visibility when the model violates schema or when your filter catches something sensitive. That makes debugging much easier than silently returning garbage.
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("guardrails")
async def guarded_answer_with_logging(user_text: str) -> str:
safe_text = guard_user_input(user_text)
if safe_text != user_text:
logger.warning("Blocked input: %s", user_text)
return safe_text
try:
return await guarded_answer(user_text)
except Exception as exc:
logger.exception("Guarded call failed for input=%r", user_text)
return f"Request could not be processed safely: {exc}"
# Example usage:
# print(asyncio.run(guarded_answer_with_logging("Tell me how to hack a login")))
Testing It
Run three test cases: one in-scope request, one blocked request, and one malformed prompt that tries to force non-JSON output. The in-scope case should return a short helpful answer, while blocked content should stop before the LLM call.
Also check that invalid JSON raises an exception instead of passing through silently. In production, wire these checks into unit tests so schema drift or prompt regressions fail fast.
Next Steps
- •Add a second pass with an LLM-based policy checker for nuanced moderation cases.
- •Replace string-based schema checks with Pydantic models for stricter validation.
- •Learn how to combine AutoGen with tool permissioning so agents can only call approved functions.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit