How to Build a policy Q&A Agent Using LangChain in Python for wealth management

By Cyprian AaronsUpdated 2026-04-21

policy-q-alangchainpythonwealth-managementpolicy-qanda

A policy Q&A agent for wealth management answers questions about investment policy statements, suitability rules, fee schedules, account restrictions, and internal operating procedures. It matters because advisors and support teams need fast, consistent answers without digging through PDFs, and every response has to stay inside compliance boundaries with a clear audit trail.

Architecture

•
Policy document store
- •Source material: IPS documents, product policy PDFs, suitability matrices, fee schedules, and compliance memos.
- •Keep versions immutable so you can prove which policy was used for a given answer.
•
Retriever
- •Use langchain_community.vectorstores with embeddings to fetch the most relevant policy chunks.
- •Add metadata filters for jurisdiction, product type, client segment, and effective date.
•
LLM response chain
- •Use ChatPromptTemplate, create_stuff_documents_chain, and create_retrieval_chain to ground answers in retrieved policy text.
- •Force concise answers with citations from the retrieved documents.
•
Guardrails layer
- •Reject out-of-scope questions like tax advice, legal interpretation beyond policy text, or personalized recommendations.
- •Add a classifier step or rules before generation.
•
Audit and observability
- •Log question, retrieved document IDs, model version, answer, and confidence signals.
- •This is non-negotiable in wealth management for compliance review and incident reconstruction.

Implementation

1. Install the right packages

Use LangChain’s split packages. For a basic retrieval-based agent you need core LangChain plus community integrations for loaders, embeddings, and vector stores.

pip install langchain langchain-community langchain-openai faiss-cpu pypdf

Set your model key in the environment:

export OPENAI_API_KEY="your-key"

2. Load policy documents and build the vector index

This example loads PDFs from a local folder, splits them into chunks, embeds them with OpenAI embeddings, and stores them in FAISS.

from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

loader = PyPDFDirectoryLoader("./policy_docs")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150,
)
chunks = splitter.split_documents(docs)

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)

retriever = vectorstore.as_retriever(
    search_kwargs={"k": 4}
)

For wealth management, tag documents before indexing if you can. Add metadata like jurisdiction=UK, product=discretionary, effective_date=2025-01-01, then filter at retrieval time so an advisor in one region does not see another region’s policy.

3. Build a grounded Q&A chain with citations

This is the core pattern: retrieve relevant policy chunks, pass them into a prompt, and force the model to answer only from that context.

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

system_prompt = (
    "You are a policy Q&A assistant for wealth management. "
    "Answer only using the provided context. "
    "If the answer is not in the context, say you do not have enough information. "
    "Do not provide legal or tax advice. "
    "Cite the source chunks by quoting short phrases from context."
)

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}\n\nContext:\n{context}")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
document_chain = create_stuff_documents_chain(llm=llm, prompt=prompt)
qa_chain = create_retrieval_chain(retriever=retriever, combine_docs_chain=document_chain)

result = qa_chain.invoke({
    "input": "Can an advisor recommend concentrated positions above 20% of portfolio value?"
})

print(result["answer"])
print(result["context"])

That pattern is production-friendly because it keeps generation grounded in retrieved text. The temperature=0 setting matters here; wealth management support flows should be deterministic.

4. Add a simple guardrail before answering

You do not want this agent answering personalized investment advice or interpreting regulation beyond stored policy. A cheap first pass is a classifier function that blocks risky prompts before they hit the LLM.

RISKY_TERMS = [
    "should I buy",
    "what stock",
    "best fund",
    "tax advice",
    "legal advice",
    "guaranteed return",
]

def is_allowed_question(question: str) -> bool:
    q = question.lower()
    return not any(term in q for term in RISKY_TERMS)

question = "Should I buy more of this fund?"
if not is_allowed_question(question):
    print("I can help with firm policy questions only.")
else:
    result = qa_chain.invoke({"input": question})
    print(result["answer"])

In practice you would replace this with a stricter moderation layer or an intent classifier using LangChain’s runnable pipeline. The important part is that risky questions get blocked before retrieval and generation.

Production Considerations

•
Deployment
- •Run the retriever and LLM behind authenticated service endpoints.
- •Keep policy indexes separate by region or business unit if data residency rules require it.
- •If documents must stay on-prem or in-country, use a local vector store and an approved model endpoint.
•
Monitoring
- •Log every request with user ID, timestamp, retrieved document IDs, prompt version, model version, and final answer.
- •Track “no answer” rates; if they spike after a policy update, your index may be stale.
- •Sample outputs for compliance review to catch hallucinated policy language early.
•
Guardrails
- •Block requests that ask for personalized investment advice or interpretations outside documented policy.
- •Return “not enough information” instead of guessing when retrieval confidence is low.
- •Require citations from source chunks for any operational answer used by advisors or operations staff.
•
Change control
- •Version policies by effective date and retire old content explicitly.
- •Rebuild embeddings when policies change materially; do not rely on ad hoc file drops into the index.

Common Pitfalls

•
Using generic web-search behavior instead of controlled policy retrieval
- •Mistake: letting the model answer from its own knowledge.
- •Fix: force retrieval-first behavior with create_retrieval_chain and a strict system prompt that says “answer only using context.”
•
Ignoring metadata filters
- •Mistake: mixing UK suitability rules with US account rules in one flat index.
- •Fix: attach metadata at ingest time and filter by jurisdiction, product line, client type, and effective date during retrieval.
•
Skipping auditability
- •Mistake: storing only the final answer.
- •Fix: log source chunk IDs or filenames alongside the response so compliance can trace exactly which policy text supported the output.
•
Letting the agent drift into advice
- •Mistake: users ask “what should I recommend to this client?” and the model starts reasoning like an advisor.
- •Fix: keep this agent scoped to internal policy Q&A. For anything personalized or suitability-related, route to human review or a separate governed workflow.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit