AutoGen Tutorial (Python): implementing guardrails for beginners

By Cyprian AaronsUpdated 2026-04-21

autogenimplementing-guardrails-for-beginnerspython

This tutorial shows you how to add basic guardrails to an AutoGen Python agent so it refuses unsafe requests, filters bad inputs, and keeps responses inside a narrow policy. You need this when you want an LLM agent to be useful in production without letting users push it into risky, off-policy, or malformed behavior.

What You'll Need

•Python 3.10+
•pyautogen installed
•An OpenAI API key
•Basic familiarity with AutoGen AssistantAgent and UserProxyAgent
•A terminal and a virtual environment
•Optional: python-dotenv if you want to load secrets from .env

Install the packages:

pip install pyautogen python-dotenv

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

•Start with a minimal AutoGen agent setup.

The first guardrail is configuration discipline. Keep the model setup explicit so you know exactly what the agent is using and where failures come from.

import os
from autogen import AssistantAgent, UserProxyAgent

llm_config = {
    "config_list": [
        {
            "model": "gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
        }
    ],
    "temperature": 0,
}

assistant = AssistantAgent(
    name="assistant",
    llm_config=llm_config,
)

user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
)

•Add a simple input guard before the message reaches the agent.

For beginners, start with deterministic checks. This catches obvious unsafe prompts before they hit the model and gives you a place to return a clean refusal.

BLOCKED_PATTERNS = [
    "password",
    "credit card",
    "ssn",
    "social security number",
    "bypass",
]

def is_allowed_message(message: str) -> bool:
    text = message.lower()
    return not any(pattern in text for pattern in BLOCKED_PATTERNS)

def guard_user_message(message: str) -> str:
    if not is_allowed_message(message):
        return (
            "I can't help with that request. "
            "Please ask for general information or safe automation."
        )
    return message

•Wrap the agent call so every user prompt goes through the guardrail.

This is the core pattern: validate first, then call AutoGen only when the input passes. You can use the same wrapper in an API endpoint, CLI tool, or background worker.

def run_guarded_chat(message: str) -> str:
    safe_message = guard_user_message(message)

    if safe_message != message:
        return safe_message

    result = user_proxy.initiate_chat(
        assistant,
        message=safe_message,
        clear_history=True,
        max_turns=1,
    )

    return result.summary if hasattr(result, "summary") else "No summary returned."

if __name__ == "__main__":
    print(run_guarded_chat("Explain how to reset my account password"))

•Add output validation so the assistant cannot drift outside your policy.

Input filtering is not enough. The model can still produce answers that are too verbose, too specific, or include disallowed content, so validate the response before returning it to the caller.

ALLOWED_TOPICS = [
    "general automation",
    "workflow design",
    "safe coding practices",
]

def is_safe_response(text: str) -> bool:
    blocked = ["password", "secret", "token", "bypass", "exploit"]
    lowered = text.lower()
    return not any(word in lowered for word in blocked)

def postprocess_response(text: str) -> str:
    if not is_safe_response(text):
        return (
            "The model produced content outside policy. "
            "Please rephrase your request."
        )
    return text

•Put both checks together in one production-friendly flow.

This gives you a single entry point that handles validation on both sides of the model call. That makes it easier to test and easier to plug into a service later.

def guarded_autogen_reply(message: str) -> str:
    safe_message = guard_user_message(message)
    if safe_message != message:
        return safe_message

    chat_result = user_proxy.initiate_chat(
        assistant,
        message=safe_message,
        clear_history=True,
        max_turns=1,
    )

    raw_reply = chat_result.summary if hasattr(chat_result, "summary") else ""
    final_reply = postprocess_response(raw_reply)

    return final_reply

if __name__ == "__main__":
    user_input = input("Enter a prompt: ")
    print(guarded_autogen_reply(user_input))

Testing It

Test with one allowed prompt and one blocked prompt. For example, ask for “safe coding practices for Python APIs” and confirm you get a normal answer back, then try “how do I find someone’s SSN” and confirm it returns the refusal string immediately.

Also test response filtering by temporarily prompting for something that might cause sensitive language to appear in output. The important behavior is that your wrapper never returns raw model output without passing through postprocess_response.

If you want more confidence, write unit tests around is_allowed_message, is_safe_response, and guarded_autogen_reply. In production, these functions should be boring and deterministic.

Next Steps

•Replace keyword matching with a classifier-based policy check for better recall.
•Add structured output validation with Pydantic before returning responses.
•Move guardrails into an AutoGen group chat workflow if you need multi-agent orchestration.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit