LlamaIndex Tutorial (Python): adding human-in-the-loop for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexadding-human-in-the-loop-for-intermediate-developerspython

This tutorial shows how to insert a human approval step into a LlamaIndex workflow so an agent can pause before taking risky actions like sending emails, creating tickets, or writing to production systems. You need this when retrieval and generation are not enough and a person must review the model’s proposed action before it executes.

What You'll Need

  • Python 3.10+
  • llama-index installed
  • An OpenAI API key set as OPENAI_API_KEY
  • A small local dataset or documents to index
  • Basic familiarity with VectorStoreIndex, QueryEngine, and LlamaIndex chat/agent concepts

Install the packages:

pip install llama-index llama-index-llms-openai llama-index-embeddings-openai

Step-by-Step

  1. Start with a simple index and query engine. We’ll use this as the retrieval layer that feeds context into the decision step.
import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.openai import OpenAI

Settings.llm = OpenAI(model="gpt-4o-mini")

documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(similarity_top_k=3)

response = query_engine.query("What is our refund policy?")
print(response)
  1. Define a human approval gate. In production this would be a UI, Slack approval, or ticketing workflow; here we use stdin so the pattern is fully executable.
def require_human_approval(action: str, payload: str) -> bool:
    print("\n--- HUMAN REVIEW REQUIRED ---")
    print(f"Action: {action}")
    print(f"Payload: {payload}")
    answer = input("Approve? (yes/no): ").strip().lower()
    return answer in {"yes", "y"}

draft_action = "send_customer_email"
draft_payload = "Subject: Refund status\nBody: We are reviewing your request."

approved = require_human_approval(draft_action, draft_payload)
print("Approved:", approved)
  1. Wrap the model output in a structured plan before execution. This makes it easy to inspect what the agent wants to do and decide whether it should proceed.
from dataclasses import dataclass

@dataclass
class ActionPlan:
    action: str
    payload: str
    requires_approval: bool = True

def build_plan(user_request: str) -> ActionPlan:
    prompt = (
        "Convert the request into a single action plan.\n"
        f"Request: {user_request}\n"
        "Return only a concise payload."
    )
    response = Settings.llm.complete(prompt)
    return ActionPlan(
        action="draft_response",
        payload=str(response).strip(),
        requires_approval=True,
    )

plan = build_plan("Draft a reply explaining our refund policy.")
print(plan)
  1. Add the approval gate before executing the plan. If the human rejects it, stop cleanly; if approved, continue with the final action.
def execute_plan(plan: ActionPlan) -> None:
    if plan.requires_approval:
        approved = require_human_approval(plan.action, plan.payload)
        if not approved:
            print("Execution cancelled by human reviewer.")
            return

    print("\nExecuting approved action:")
    print(plan.payload)

plan = build_plan("Draft a reply explaining our refund policy.")
execute_plan(plan)
  1. Put retrieval and approval together in one flow. The agent first gathers context from your documents, then drafts an action based on that context, and finally waits for review.
def answer_with_human_in_loop(question: str) -> None:
    context = query_engine.query(question)
    prompt = f"""
You are drafting an internal response.
Question: {question}
Context: {context}

Write a short customer-facing draft.
"""
    draft = Settings.llm.complete(prompt)
    plan = ActionPlan(action="send_customer_email", payload=str(draft).strip())

    if require_human_approval(plan.action, plan.payload):
        print("\nApproved draft:")
        print(plan.payload)
    else:
        print("\nDraft rejected.")

answer_with_human_in_loop("Can we refund shipping fees for late delivery?")

Testing It

Run the script against a few real questions from your support or operations domain. Try one benign request and one risky request so you can confirm that both paths work: approval continues execution, rejection stops it.

Watch for three things:

  • The retriever returns relevant context from your documents.
  • The model produces a readable draft or action plan.
  • The approval prompt appears before anything “real” happens.

If you want stronger verification, log every proposed action with timestamps and reviewer decisions. That gives you an audit trail for compliance teams and helps you debug bad approvals later.

Next Steps

  • Replace input() with a real approval channel like Slack, Teams, or an internal web app.
  • Add structured outputs using Pydantic so your agent plans are validated before review.
  • Explore LlamaIndex agents with tool calling so only specific tools require human approval while low-risk reads stay automatic.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides