CrewAI Tutorial (Python): filtering toxic output for beginners

By Cyprian AaronsUpdated 2026-04-21
crewaifiltering-toxic-output-for-beginnerspython

This tutorial shows how to build a CrewAI pipeline in Python that detects and filters toxic output before it reaches your user. You need this when you’re generating customer-facing text, support replies, or internal agent responses and want a simple safety gate without wiring in a full moderation stack.

What You'll Need

  • Python 3.10+
  • A virtual environment
  • crewai
  • openai
  • An OpenAI API key set as OPENAI_API_KEY
  • Basic familiarity with CrewAI agents, tasks, and crews

Step-by-Step

  1. Start by installing the packages and setting up your environment. Keep this isolated so you can test moderation logic without polluting your global Python install.
python -m venv .venv
source .venv/bin/activate
pip install crewai openai
export OPENAI_API_KEY="your-api-key-here"
  1. Create a small moderation helper that checks text for toxic language using a simple rule-based filter. This is not a replacement for a full safety model, but it gives you an executable baseline that works immediately.
TOXIC_PATTERNS = [
    "idiot",
    "stupid",
    "hate you",
    "kill yourself",
    "moron",
]

def is_toxic(text: str) -> bool:
    lowered = text.lower()
    return any(pattern in lowered for pattern in TOXIC_PATTERNS)

def filter_toxic_output(text: str) -> str:
    if is_toxic(text):
        return "[BLOCKED: toxic content detected]"
    return text
  1. Build a CrewAI agent that generates a response, then run the output through your filter before returning it. The key pattern here is to keep generation and moderation separate so you can swap the filter later without changing the agent setup.
from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)

writer = Agent(
    role="Customer Support Writer",
    goal="Write short, helpful replies to user messages.",
    backstory="You write concise support responses for a banking app.",
    llm=llm,
    verbose=False,
)

task = Task(
    description="Reply politely to: 'The app is broken and your team is useless.'",
    expected_output="A short support reply.",
    agent=writer,
)

crew = Crew(
    agents=[writer],
    tasks=[task],
    process=Process.sequential,
)

result = crew.kickoff()
safe_output = filter_toxic_output(str(result))
print(safe_output)
  1. Add a second pass that checks both user input and model output. In production, you usually want to block toxic prompts before they reach the agent and block toxic completions before they leave your service.
def handle_message(user_text: str) -> str:
    if is_toxic(user_text):
        return "[BLOCKED: unsafe user input]"

    reply = crew.kickoff(inputs={"user_text": user_text})
    cleaned_reply = filter_toxic_output(str(reply))
    return cleaned_reply

print(handle_message("Can you help me reset my password?"))
print(handle_message("You are stupid"))
  1. If you want stricter control, make the agent produce structured output and validate it before display. That gives you one more place to enforce policy without guessing based on free-form text alone.
from pydantic import BaseModel

class SupportReply(BaseModel):
    message: str

def validate_reply(text: str) -> str:
    if is_toxic(text):
        return "[BLOCKED: toxic content detected]"
    return SupportReply(message=text).message

raw_reply = "Please reset your password from the settings menu."
print(validate_reply(raw_reply))

Testing It

Run the script with a safe prompt and confirm you get the normal response back. Then try obvious toxic inputs like “you are stupid” or “kill yourself” and verify they get replaced with the blocked message.

Also test edge cases like mixed-case insults, extra punctuation, or toxic phrases embedded inside longer sentences. If you want to go one step further, log blocked inputs so you can review false positives and tune the pattern list over time.

Next Steps

  • Replace the rule-based filter with an LLM-based moderation task using a separate CrewAI agent.
  • Add JSON schema validation so every agent response must pass both structure checks and toxicity checks.
  • Store moderation events in your audit logs so support teams can review blocked content later.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides