LangGraph Tutorial (Python): filtering toxic output for intermediate developers

By Cyprian AaronsUpdated 2026-04-22

langgraphfiltering-toxic-output-for-intermediate-developerspython

This tutorial shows how to build a LangGraph pipeline that detects toxic model output, routes it through a moderation check, and either blocks or rewrites the response before it reaches the user. You need this when you’re building assistants for regulated environments where one bad completion can create legal, brand, or safety risk.

What You'll Need

•Python 3.10+
•langgraph
•langchain-openai
•langchain-core
•An OpenAI API key set as OPENAI_API_KEY
•Basic familiarity with LangGraph state, nodes, and conditional edges

Install the packages:

pip install langgraph langchain-openai langchain-core

Step-by-Step

•Start by defining a small state object that carries the user prompt, the model draft, the moderation result, and the final answer. Keep the state explicit so every node has a narrow contract.

from typing import TypedDict

class GraphState(TypedDict):
    prompt: str
    draft: str
    is_toxic: bool
    final: str

•Create two models: one for generation and one for moderation. The moderation node does not need to be fancy; for production you can swap in a dedicated safety model or rules engine later.

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

generator = ChatOpenAI(model="gpt-4o-mini", temperature=0)
moderator = ChatOpenAI(model="gpt-4o-mini", temperature=0)

•Add a generation node that produces a first-pass answer, then add a moderation node that classifies that answer as toxic or safe. The key pattern is that moderation checks the model output, not just the user input.

def generate(state: GraphState) -> dict:
    msg = generator.invoke([
        HumanMessage(content=f"Answer the user clearly and concisely:\n{state['prompt']}")
    ])
    return {"draft": msg.content}

def moderate(state: GraphState) -> dict:
    verdict = moderator.invoke([
        HumanMessage(content=(
            "Classify this text as toxic or safe. "
            "Reply with exactly one word: toxic or safe.\n\n"
            f"{state['draft']}"
        ))
    ])
    is_toxic = verdict.content.strip().lower() == "toxic"
    return {"is_toxic": is_toxic}

•Build two downstream handlers: one returns the draft unchanged when it passes moderation, and another replaces toxic content with a safe refusal. In production you usually want to log the blocked draft separately for audit.

def allow(state: GraphState) -> dict:
    return {"final": state["draft"]}

def block(state: GraphState) -> dict:
    return {
        "final": (
            "I can’t provide that response. "
            "Please rephrase your request or ask for something safer."
        )
    }

•Wire the graph with conditional routing from moderation to either allow or block. This is where LangGraph earns its keep: the control flow stays explicit and easy to inspect.

from langgraph.graph import StateGraph, START, END

def route(state: GraphState) -> str:
    return "block" if state["is_toxic"] else "allow"

builder = StateGraph(GraphState)
builder.add_node("generate", generate)
builder.add_node("moderate", moderate)
builder.add_node("allow", allow)
builder.add_node("block", block)

builder.add_edge(START, "generate")
builder.add_edge("generate", "moderate")
builder.add_conditional_edges("moderate", route, {
    "allow": "allow",
    "block": "block",
})
builder.add_edge("allow", END)
builder.add_edge("block", END)

graph = builder.compile()

•Run it against a normal prompt and inspect the final state. If you want deterministic testing later, replace the LLM calls with fixed fixtures and keep the graph structure unchanged.

result = graph.invoke({
    "prompt": "Explain how to write a Python list comprehension.",
    "draft": "",
    "is_toxic": False,
    "final": "",
})

print(result["final"])

Testing It

Test with two prompts: one benign and one designed to provoke unsafe language from your generator. The benign case should pass through unchanged, while the unsafe case should hit the refusal path.

If you want stronger assurance, print both draft and final during local runs so you can confirm moderation is acting on generated output rather than user input alone. In production, send blocked drafts to structured logs or an audit sink instead of stdout.

A good next test is to force moderator to return "toxic" and verify that routing still works even when generation produced something harmless. That tells you your policy layer is actually controlling output delivery.

Next Steps

•Replace the simple one-word classifier with structured JSON output and schema validation.
•Add a second moderation pass for self-harm, hate speech, and PII leakage.
•Persist blocked drafts and moderation decisions in your observability stack for auditability.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit