LlamaIndex Tutorial (Python): filtering toxic output for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexfiltering-toxic-output-for-intermediate-developerspython

This tutorial shows you how to add a toxicity filter around a LlamaIndex-powered Python app so unsafe model output gets caught before it reaches the user. You need this when your app can answer open-ended prompts, summarize user content, or generate responses in regulated environments where toxic, abusive, or policy-violating text is not acceptable.

What You'll Need

  • Python 3.10+
  • llama-index
  • openai
  • An OpenAI API key set as OPENAI_API_KEY
  • A basic understanding of LlamaIndex query engines
  • A terminal and a virtual environment

Install the packages first:

pip install llama-index openai

Step-by-Step

  1. Start with a normal LlamaIndex setup and load a small local dataset. For this tutorial, we’ll use an in-memory document so you can run it end to end without extra files.
from llama_index.core import Document, VectorStoreIndex

docs = [
    Document(
        text=(
            "Customer support policy: be respectful, avoid insults, "
            "and escalate abuse to a human agent."
        )
    )
]

index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
  1. Add a toxicity classifier using the OpenAI Moderation API. This gives you a deterministic pre-check and post-check around the LLM response instead of hoping the model behaves.
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def is_toxic(text: str) -> bool:
    result = client.moderations.create(
        model="omni-moderation-latest",
        input=text,
    )
    return result.results[0].flagged
  1. Wrap your query flow so you filter both the user prompt and the generated answer. If the prompt is toxic, reject it immediately; if the answer is toxic, replace it with a safe fallback.
def safe_query(query: str) -> str:
    if is_toxic(query):
        return "Request blocked: toxic input detected."

    response = query_engine.query(query)
    answer = str(response)

    if is_toxic(answer):
        return "Response blocked: unsafe output detected."

    return answer


print(safe_query("What is this policy about?"))
  1. Add a stricter response guard for cases where your app should never expose raw model output. This pattern is useful when you want to log the unsafe text internally but only return sanitized content to the caller.
def safe_query_with_logging(query: str) -> str:
    if is_toxic(query):
        print(f"[blocked-input] {query}")
        return "Request blocked."

    response = query_engine.query(query)
    answer = str(response)

    if is_toxic(answer):
        print(f"[blocked-output] {answer}")
        return "Response blocked."

    return answer


result = safe_query_with_logging("Summarize the policy in one sentence.")
print(result)
  1. If you want this to scale beyond one query function, put the filter behind a small service layer. That keeps your LlamaIndex code clean and makes it easy to reuse across chat endpoints, batch jobs, and agent tools.
class ToxicityGuard:
    def __init__(self, engine):
        self.engine = engine

    def run(self, query: str) -> str:
        if is_toxic(query):
            return "Request blocked: toxic input detected."

        response = self.engine.query(query)
        answer = str(response)

        if is_toxic(answer):
            return "Response blocked: unsafe output detected."

        return answer


guarded_engine = ToxicityGuard(query_engine)
print(guarded_engine.run("Explain the support policy"))

Testing It

Test with three kinds of prompts: a normal business question, an obviously abusive prompt, and a prompt that could trigger unsafe language in the response. You should see normal answers pass through, while toxic inputs get blocked before querying and unsafe outputs get replaced after generation.

A simple sanity check is to log every blocked event and confirm that your app never returns raw flagged text to the caller. If you are wiring this into an API, make sure your HTTP status codes and error payloads are consistent with your product requirements.

Also test latency. Moderation calls add overhead, so measure how much time your pre-check and post-check introduce before putting this into production.

Next Steps

  • Replace print() calls with structured logging and metrics for auditability.
  • Move from binary blocking to severity-based routing so borderline content can be reviewed by humans.
  • Combine toxicity filtering with PII redaction before storing or returning any model output.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides