LangChain Tutorial (Python): filtering toxic output for beginners

By Cyprian AaronsUpdated 2026-04-21

langchainfiltering-toxic-output-for-beginnerspython

This tutorial shows you how to add a toxic-output filter to a LangChain Python app before the response reaches the user. You need this when your model can generate abusive, hateful, or unsafe text and you want a simple guardrail in front of chat responses.

What You'll Need

•Python 3.10+
•A working OpenAI API key
•
These packages:
- •langchain
- •langchain-openai
- •langchain-community
- •langchain-experimental
•Basic familiarity with LangChain chains and prompts
•A terminal and a virtual environment

Install the dependencies:

pip install langchain langchain-openai langchain-community langchain-experimental

Set your API key:

export OPENAI_API_KEY="your-api-key"

Step-by-Step

•Start with a simple chat chain that can generate normal responses. We’ll use this as the source output that needs filtering.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("user", "{question}")
])

chain = prompt | llm

result = chain.invoke({"question": "Explain why teamwork matters in one paragraph."})
print(result.content)

•Add a moderation step using OpenAI’s moderation model through LangChain. This gives you a clean yes/no signal before you return the answer to the user.

from openai import OpenAI

client = OpenAI()

def is_toxic(text: str) -> bool:
    response = client.moderations.create(input=text)
    scores = response.results[0]
    return scores.flagged

sample_text = "I hate everyone in this group."
print("Toxic?" , is_toxic(sample_text))

•Wrap your model call so the output gets checked before it leaves your app. If the text is flagged, return a safe fallback instead of the raw model output.

def safe_answer(question: str) -> str:
    result = chain.invoke({"question": question})
    text = result.content

    if is_toxic(text):
        return "I can’t provide that response. Please rephrase your request."

    return text

print(safe_answer("Write a rude insult about managers."))

•Make the filter stricter by checking both the user input and the model output. This protects you from toxic prompts as well as toxic completions.

def safe_answer_with_input_check(question: str) -> str:
    if is_toxic(question):
        return "Your request was blocked because it may contain unsafe language."

    result = chain.invoke({"question": question})
    text = result.content

    if is_toxic(text):
        return "I can’t provide that response. Please rephrase your request."

    return text

print(safe_answer_with_input_check("Insult my coworker in one sentence."))

•Put the logic into a reusable function so it fits into a real app. This is the pattern you want for APIs, agents, or chatbots.

def run_guardrailed_chain(question: str) -> dict:
    if is_toxic(question):
        return {"blocked": True, "response": "Request blocked for safety."}

    result = chain.invoke({"question": question})
    text = result.content

    if is_toxic(text):
        return {"blocked": True, "response": "Model output blocked for safety."}

    return {"blocked": False, "response": text}

output = run_guardrailed_chain("Give me a professional summary of project risks.")
print(output)

Testing It

Run the script with three kinds of inputs: a normal business question, an obviously toxic prompt, and a prompt designed to bait the model into producing unsafe language. You should see normal responses pass through and unsafe content replaced with your fallback message.

Also test edge cases like short inputs, quoted toxic text, and multilingual prompts if your application serves global users. If you’re building an API, log whether the block came from user input or model output so you can tune prompts later.

A good smoke test is to print both blocked and response fields from run_guardrailed_chain(). That makes it obvious whether your guardrail is catching issues at the right stage.

Next Steps

•Add a second classifier for self-harm or sexual content, not just toxicity.
•Move this pattern into a LangChain Runnable pipeline with retries and structured logging.
•Store moderation decisions in your observability stack so you can review false positives and false negatives later.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit