Haystack Tutorial (Python): adding human-in-the-loop for advanced developers

By Cyprian AaronsUpdated 2026-04-21
haystackadding-human-in-the-loop-for-advanced-developerspython

This tutorial shows how to insert a human approval gate into a Haystack pipeline so an agent can pause before taking a risky action. You need this when the model is confident enough to draft an answer, but a person still has to review, edit, or approve before the system responds to a customer or triggers an external workflow.

What You'll Need

  • Python 3.10+
  • haystack-ai
  • python-dotenv
  • An OpenAI API key in OPENAI_API_KEY
  • A working Haystack setup with basic pipeline knowledge
  • Optional: terminal access for running a small local review app or script

Install the packages:

pip install haystack-ai python-dotenv

Step-by-Step

  1. Start with a normal Haystack generation flow. The point is not to replace your pipeline; it’s to add a checkpoint between model output and final action.
import os
from dotenv import load_dotenv
from haystack import Document, Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator

load_dotenv()

template = """
You are a banking assistant.
Answer the user using only the provided context.

Question: {{ question }}
Context:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}

Answer:
"""

pipeline = Pipeline()
pipeline.add_component("prompt_builder", PromptBuilder(template=template))
pipeline.add_component("llm", OpenAIChatGenerator(model="gpt-4o-mini"))

pipeline.connect("prompt_builder.prompt", "llm.messages")
  1. Add retrieval so the model has grounded context before any human review happens. In regulated workflows, you want the reviewer to see both the user request and the evidence the model used.
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore()
documents = [
    Document(content="Savings accounts require two business days for wire transfers."),
    Document(content="Debit card replacement takes 5 to 7 business days."),
    Document(content="Transfers above $10,000 require manual compliance review."),
]
document_store.write_documents(documents)

retriever = InMemoryBM25Retriever(document_store=document_store)
pipeline.add_component("retriever", retriever)

pipeline.connect("retriever.documents", "prompt_builder.documents")
  1. Run the model and capture its draft output as a review artifact. This is the human-in-the-loop boundary: downstream code should not act on the response until someone approves it.
question = "Can I wire $15,000 today from my savings account?"
result = pipeline.run(
    {
        "retriever": {"query": question},
        "prompt_builder": {"question": question},
    }
)

draft_answer = result["llm"]["replies"][0].text
print("\n=== Draft answer ===\n")
print(draft_answer)
  1. Wrap that draft in an explicit approval step. In production, this could be a UI button, Slack approval, or ticketing workflow; here it’s an interactive terminal prompt that behaves like a real gate.
def get_human_approval(question: str, draft: str) -> tuple[bool, str]:
    print("\n=== Review packet ===")
    print(f"Question: {question}")
    print(f"Draft: {draft}\n")

    decision = input("Approve? (y/n): ").strip().lower()
    if decision != "y":
        edited = input("Enter revised response for customer use:\n").strip()
        return False, edited

    return True, draft


approved, final_answer = get_human_approval(question, draft_answer)
print("\n=== Final response ===\n")
print(final_answer)
  1. If you want this pattern to scale cleanly, isolate the human gate from your Haystack pipeline. The pipeline should produce structured artifacts; your approval layer should decide whether those artifacts can be released.
from dataclasses import dataclass

@dataclass
class ReviewResult:
    approved: bool
    text: str

def run_with_review(user_question: str) -> ReviewResult:
    result = pipeline.run(
        {
            "retriever": {"query": user_question},
            "prompt_builder": {"question": user_question},
        }
    )
    draft = result["llm"]["replies"][0].text
    approved, text = get_human_approval(user_question, draft)
    return ReviewResult(approved=approved, text=text)

review_result = run_with_review("Can I wire $15,000 today from my savings account?")
print(review_result)

Testing It

Run the script and confirm you see three distinct stages: retrieved context, model draft, and human approval prompt. Reject once and make sure your edited text becomes the final output instead of the raw LLM reply.

Also test with a low-risk query and approve it unchanged. For example, ask about debit card replacement times and verify that the final response matches the generated draft when you press y.

If you’re wiring this into an API or queue worker, assert that no side effects happen before approval. The safe pattern is simple: generate first, persist the draft, then wait for explicit human action before sending email, updating CRM records, or executing transactions.

Next Steps

  • Replace input() with a real review UI backed by FastAPI or Streamlit.
  • Store draft answers and approvals in Postgres so every decision is auditable.
  • Add policy checks before review so only high-risk requests reach a human gate.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides