LlamaIndex Tutorial (Python): adding human-in-the-loop for advanced developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexadding-human-in-the-loop-for-advanced-developerspython

This tutorial shows how to insert a human approval step into a LlamaIndex Python workflow before an agent executes a risky action. You need this when the model can draft answers or decisions, but a person must review sensitive outputs like claims decisions, payment instructions, or customer-facing compliance language.

What You'll Need

  • Python 3.10+
  • llama-index
  • An LLM API key, such as OPENAI_API_KEY
  • A terminal and virtual environment
  • A basic LlamaIndex setup with an index or query engine
  • Optional: a Slack/email/web UI for routing approvals in production

Step-by-Step

  1. Start by installing the packages and setting your API key. This example uses OpenAI through LlamaIndex, but the human-in-the-loop pattern works with any backend that returns structured output.
pip install llama-index llama-index-llms-openai pydantic
export OPENAI_API_KEY="your-key-here"
  1. Define the data structures for the model output and the approval gate. The key idea is to separate “what the model proposes” from “what a human must approve.”
from typing import Literal
from pydantic import BaseModel, Field

class ClaimDecision(BaseModel):
    decision: Literal["approve", "deny", "needs_review"]
    reason: str = Field(..., description="Short explanation for the decision")
    risk_level: Literal["low", "medium", "high"]

class ApprovalResult(BaseModel):
    approved: bool
    reviewer: str
    note: str
  1. Build a normal LlamaIndex query flow that produces a structured recommendation. Here we ask the LLM to classify a claim scenario into one of three outcomes, then we route high-risk outcomes to a human.
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o-mini")

prompt = """
You are assisting an insurance operations team.
Classify this claim scenario into approve, deny, or needs_review.
Return concise reasoning and risk level.

Scenario:
A customer submitted a claim for water damage.
The policy lapsed two days before the incident date.
"""

decision = llm.structured_predict(ClaimDecision, prompt)
print(decision)
  1. Add the human-in-the-loop gate as a plain Python function. In production, this function could call an internal review service; here it uses terminal input so you can run it end-to-end.
def request_human_approval(decision: ClaimDecision) -> ApprovalResult:
    print("\nMODEL RECOMMENDATION")
    print(f"Decision: {decision.decision}")
    print(f"Risk: {decision.risk_level}")
    print(f"Reason: {decision.reason}\n")

    answer = input("Approve this recommendation? (yes/no): ").strip().lower()
    reviewer = input("Reviewer name: ").strip()

    approved = answer == "yes"
    note = input("Reviewer note: ").strip()

    return ApprovalResult(approved=approved, reviewer=reviewer, note=note)
  1. Wire the gate into your application logic so risky actions only happen after approval. This is the part developers usually miss: the model can suggest an action, but your code owns execution.
def execute_claim_action(decision: ClaimDecision, approval: ApprovalResult) -> None:
    if not approval.approved:
        print(f"\nAction blocked by {approval.reviewer}: {approval.note}")
        return

    if decision.decision == "approve":
        print(f"\nClaim approved by {approval.reviewer}.")
    elif decision.decision == "deny":
        print(f"\nClaim denied by {approval.reviewer}.")
    else:
        print(f"\nClaim sent for further review by {approval.reviewer}.")

if decision.risk_level == "high" or decision.decision == "needs_review":
    approval = request_human_approval(decision)
else:
    approval = ApprovalResult(
        approved=True,
        reviewer="system",
        note="Auto-approved low-risk recommendation",
    )

execute_claim_action(decision, approval)
  1. If you want this pattern inside a larger RAG app, keep retrieval and generation separate from approval. The retriever can gather evidence as usual, while the final action waits behind a deterministic gate.
from llama_index.core import VectorStoreIndex, Document

docs = [
    Document(text="Policy lapse within grace period requires manual review."),
    Document(text="Water damage claims require proof of active coverage."),
]

index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()

evidence = query_engine.query("What should happen if a policy lapsed two days before water damage?")
print("\nEVIDENCE")
print(evidence)

Testing It

Run the script with one scenario that should be low risk and another that should be high risk. For low-risk cases, confirm your code auto-applies the recommendation without asking for approval.

For high-risk cases, confirm execution stops at the approval prompt and nothing downstream runs until you approve it. If you reject it, verify the action is blocked and only an audit-style message is printed.

Also check that your structured output always matches ClaimDecision. If the model starts returning malformed fields in your environment, tighten the prompt or add validation/retry logic around structured_predict.

Next Steps

  • Replace terminal input with a real reviewer workflow using Slack interactive messages or an internal web app.
  • Add audit logging for every model recommendation, human override, and final action.
  • Move from simple branching to policy-based approvals using confidence thresholds and business rules.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides