How to Build a policy Q&A Agent Using LlamaIndex in Python for wealth management

By Cyprian AaronsUpdated 2026-04-21

policy-q-allamaindexpythonwealth-managementpolicy-qanda

A policy Q&A agent for wealth management answers questions like “Can this client hold this product in this jurisdiction?” or “What is the firm’s rule for discretionary trading approvals?” It matters because advisors, operations teams, and compliance staff need fast answers grounded in current policy, not guesses from a general-purpose chatbot.

Architecture

•
Policy document ingestion
- •Pulls PDFs, DOCX, HTML policy pages, and internal memos into a controlled corpus.
- •Keeps source metadata like document name, version, owner, jurisdiction, and effective date.
•
Indexing layer
- •Uses LlamaIndex to chunk and embed policy text into a vector index.
- •Supports retrieval over multiple policy domains: suitability, AML/KYC, product restrictions, disclosures.
•
Retriever + response synthesizer
- •Retrieves the most relevant policy passages for each question.
- •Synthesizes an answer with citations so users can trace every claim back to source text.
•
Guardrails and routing
- •Detects when a question is out of scope or needs escalation to compliance.
- •Prevents the agent from inventing policy or giving legal advice.
•
Audit logging
- •Stores query text, retrieved chunks, response, timestamps, user identity, and document versions.
- •Gives compliance teams an evidence trail for review and incident response.

Implementation

1) Install dependencies and load policy documents

For wealth management, keep the corpus narrow. Start with approved policy docs only; do not mix in random shared-drive content.

pip install llama-index llama-index-llms-openai llama-index-embeddings-openai pypdf

from pathlib import Path
from llama_index.core import SimpleDirectoryReader

POLICY_DIR = Path("./policy_docs")

documents = SimpleDirectoryReader(
    input_dir=str(POLICY_DIR),
    recursive=True,
    required_exts=[".pdf", ".txt", ".md"]
).load_data()

print(f"Loaded {len(documents)} policy documents")

2) Build the index with metadata-aware chunks

Use SentenceSplitter so retrieval works on policy clauses instead of giant pages. Add metadata early; you will need it for auditability and jurisdiction filtering later.

from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.schema import Document

splitter = SentenceSplitter(chunk_size=512, chunk_overlap=80)

docs_with_metadata = []
for doc in documents:
    # If your source system provides these fields, map them here.
    doc.metadata = {
        "source": doc.metadata.get("file_name", "unknown"),
        "jurisdiction": "US",
        "policy_area": "wealth_management",
    }
    docs_with_metadata.append(doc)

nodes = splitter.get_nodes_from_documents(docs_with_metadata)
index = VectorStoreIndex(nodes)

3) Create a constrained query engine with citations

The key pattern is: retrieve only from approved policies, then force the model to answer with sources. In LlamaIndex, as_query_engine() is the shortest path to a production-ready retrieval flow.

from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

query_engine = index.as_query_engine(
    similarity_top_k=4,
    response_mode="compact"
)

question = (
    "Can an advisor recommend leveraged ETFs to a retail client "
    "in a discretionary account?"
)

response = query_engine.query(question)
print(response.response)
for source in response.source_nodes:
    print(source.node.metadata.get("source"), source.score)

If you want stronger control over the prompt, use get_prompts() / custom prompting through LlamaIndex’s query engine configuration. For compliance-heavy workflows, keep temperature at zero and require citations in every answer.

4) Add an escalation path for low-confidence or out-of-policy questions

A wealth management agent should know when to stop. If retrieval confidence is weak or the question asks for legal interpretation, route to human review.

def answer_policy_question(question: str):
    response = query_engine.query(question)

    # Simple guardrail: no supporting sources means escalate.
    if not getattr(response, "source_nodes", []):
        return {
            "answer": "I couldn't find enough approved policy text to answer this safely.",
            "action": "escalate_to_compliance",
        }

    top_score = max(node.score or 0 for node in response.source_nodes)
    if top_score < 0.75:
        return {
            "answer": response.response,
            "action": "review_required",
            "top_score": top_score,
        }

    return {
        "answer": response.response,
        "action": "approved",
        "sources": [
            {
                "source": node.node.metadata.get("source"),
                "score": node.score,
            }
            for node in response.source_nodes
        ],
    }

Production Considerations

•
Data residency
- •Keep embeddings, indexes, and logs in-region if your firm has jurisdictional constraints.
- •For cross-border firms, separate indexes by region instead of building one global corpus.
•
Auditability
- •Log the exact prompt, retrieved nodes, document versions, user ID, and final answer.
- •Store immutable records so compliance can reconstruct what the agent saw at decision time.
•
Guardrails
- •Block requests that ask for legal advice or override policy language.
- •Add escalation rules for low retrieval confidence, conflicting sources, or missing citations.
•
Monitoring
- •Track unanswered questions, fallback rate to humans, and top failing policies.
- •Review drift when policies change; stale embeddings are a real risk after quarterly updates.

Common Pitfalls

•
Indexing unapproved content
- •If you ingest drafts or personal notes alongside official policies, retrieval will surface contradictions.
- •Fix it by maintaining an allowlist of approved sources and versioned document pipelines.
•
Letting the model answer without citations
- •A free-form answer is useless in a regulated environment if nobody can trace it back.
- •Fix it by requiring source nodes in every response and rejecting uncited outputs.
•
Ignoring jurisdiction and client segment
- •A rule that applies in one region may be wrong elsewhere, especially for products and disclosures.
- •Fix it by attaching metadata like jurisdiction, client type, and effective date to every document chunk and filtering retrieval accordingly.
•
Treating stale policies as current truth
- •Wealth management policies change often: product lists update, suitability rules shift, disclosure language changes.
- •Fix it by reindexing on document publication events and expiring old versions from active retrieval paths.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit