LangGraph Tutorial (Python): filtering toxic output for beginners

By Cyprian AaronsUpdated 2026-04-22
langgraphfiltering-toxic-output-for-beginnerspython

This tutorial shows you how to build a LangGraph workflow in Python that detects toxic model output and blocks it before the user sees it. You need this when you want a chat agent to stay within policy, especially in customer-facing apps where unsafe language can create support, legal, or brand problems.

What You'll Need

  • Python 3.10+
  • langgraph
  • langchain-core
  • langchain-openai
  • An OpenAI API key set as OPENAI_API_KEY
  • Basic familiarity with LangGraph nodes, edges, and state

Install the packages:

pip install langgraph langchain-core langchain-openai

Step-by-Step

  1. Start by defining a tiny graph state. We only need the user input, the model output, and a safety flag that tells us whether to block the response.
from typing import TypedDict

class State(TypedDict):
    user_input: str
    draft_output: str
    is_toxic: bool
  1. Next, create a generation node and a toxicity check node. For beginner-friendly filtering, this tutorial uses a simple keyword-based detector so you can focus on the LangGraph pattern first.
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

TOXIC_WORDS = {"idiot", "stupid", "hate", "kill"}

def generate_response(state: State) -> dict:
    result = llm.invoke([HumanMessage(content=state["user_input"])])
    return {"draft_output": result.content}

def detect_toxicity(state: State) -> dict:
    text = state["draft_output"].lower()
    is_toxic = any(word in text for word in TOXIC_WORDS)
    return {"is_toxic": is_toxic}
  1. Add a safe fallback node that replaces toxic output with a neutral refusal. This is the part that actually filters the response before it leaves your graph.
def safe_fallback(state: State) -> dict:
    return {
        "draft_output": (
            "I can't help with harmful or abusive content. "
            "If you want, I can rephrase this in a respectful way."
        )
    }
  1. Now wire the nodes together with conditional routing. The graph generates text first, checks it for toxicity, and then either returns it or replaces it with the fallback.
from langgraph.graph import StateGraph, START, END

def route_after_check(state: State) -> str:
    return "safe_fallback" if state["is_toxic"] else END

builder = StateGraph(State)

builder.add_node("generate_response", generate_response)
builder.add_node("detect_toxicity", detect_toxicity)
builder.add_node("safe_fallback", safe_fallback)

builder.add_edge(START, "generate_response")
builder.add_edge("generate_response", "detect_toxicity")
builder.add_conditional_edges("detect_toxicity", route_after_check)

graph = builder.compile()
  1. Finally, run the graph with both benign and toxic prompts. The output should pass through unchanged when safe, and get replaced when the detector finds banned language.
safe_result = graph.invoke({"user_input": "Explain how photosynthesis works."})
print("SAFE:", safe_result["draft_output"])

toxic_result = graph.invoke({"user_input": "Write an insulting reply to my coworker."})
print("TOXIC:", toxic_result["draft_output"])

Testing It

Run the script and check that normal prompts produce normal answers. Then try prompts that are likely to cause unsafe wording and confirm the fallback message appears instead of the raw model output.

If you want stronger verification, print is_toxic from the final state and add a few test cases with different wording. In production, you should also log blocked responses so you can tune your filter over time.

Next Steps

  • Replace the keyword matcher with an LLM-based moderation node or a dedicated moderation API.
  • Add separate categories for harassment, self-harm, sexual content, and hate speech.
  • Extend the graph so toxic drafts are rewritten into safer alternatives instead of being fully blocked.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides