How to Build a policy Q&A Agent Using LangChain in Python for fintech

By Cyprian AaronsUpdated 2026-04-21

policy-q-alangchainpythonfintechpolicy-qanda

A policy Q&A agent answers questions about internal banking or insurance policies using approved source documents, not guesswork. For fintech, that matters because teams need fast answers on KYC, AML, refunds, chargebacks, data retention, and incident handling without exposing the company to compliance drift or inconsistent advice.

Architecture

•
Policy document store
- •Source PDFs, markdown, SharePoint exports, or Confluence pages.
- •Keep only approved policy versions with metadata like policy_id, version, owner, and effective_date.
•
Document loader and chunking pipeline
- •Use LangChain loaders such as PyPDFLoader or TextLoader.
- •Split policies into retrieval-friendly chunks with RecursiveCharacterTextSplitter.
•
Vector index
- •Store embeddings in a vector DB such as FAISS for local setups or a managed store for production.
- •Add metadata filters so the agent can restrict answers to jurisdiction, business unit, or policy version.
•
Retriever
- •Use a retriever built from the vector store.
- •Tune k, similarity thresholds, and metadata filters to avoid pulling stale or irrelevant policy text.
•
LLM answer chain
- •Use a chat model through LangChain’s ChatOpenAI.
- •Force grounded answers with citations and a refusal path when the policy corpus does not support an answer.
•
Audit and guardrails layer
- •Log question, retrieved chunks, final answer, source IDs, and model version.
- •Add redaction for PII and rules for regulated topics like sanctions, disputes, and suspicious activity handling.

Implementation

•Load policies and build the vector index

Use one loader per document type. For a simple setup, load PDF policies into documents, split them into chunks, then embed them into FAISS.

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

loader = PyPDFLoader("policies/aml_policy.pdf")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150,
)
chunks = splitter.split_documents(docs)

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)

vectorstore.save_local("faiss_policy_index")

This is the core retrieval layer. In fintech, attach metadata before indexing so you can filter by region or policy family later.

•Create a retriever with policy constraints

A broad retriever is risky in regulated environments. Restrict it to approved content and keep retrieval deterministic enough for audit review.

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.load_local(
    "faiss_policy_index",
    embeddings,
    allow_dangerous_deserialization=True,
)

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4}
)

question = "Can we onboard a customer without proof of address?"
docs = retriever.get_relevant_documents(question)
for doc in docs:
    print(doc.metadata)
    print(doc.page_content[:300])

For production fintech systems, use metadata-aware retrieval if your policies differ by country or product line. A UK onboarding rule should not be answered from a US retail banking policy.

•Build a grounded Q&A chain

Use ChatOpenAI plus create_stuff_documents_chain and create_retrieval_chain. The prompt should tell the model to answer only from retrieved policy text and to say “I don’t know” when the evidence is missing.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are a policy Q&A assistant for a fintech company. "
     "Answer only using the provided policy context. "
     "If the context does not contain enough information, say: "
     "\"I don't know based on the current policy documents.\" "
     "Always cite the relevant policy excerpts."),
    ("human", "{input}\n\nContext:\n{context}")
])

document_chain = create_stuff_documents_chain(llm, prompt)
qa_chain = create_retrieval_chain(retriever, document_chain)

result = qa_chain.invoke({"input": "Can we onboard a customer without proof of address?"})
print(result["answer"])

That pattern gives you retrieval plus generation without hand-wiring prompts around raw vector search results. It also makes it easier to test because you can inspect both retrieved docs and final output.

•Add citations and audit logging

Fintech teams need traceability. Store question/answer pairs with source identifiers so compliance can review how an answer was produced.

import json
from datetime import datetime

response = qa_chain.invoke({"input": "How long do we retain KYC records?"})

audit_record = {
    "timestamp": datetime.utcnow().isoformat(),
    "question": "How long do we retain KYC records?",
    "answer": response["answer"],
}

with open("audit_log.jsonl", "a") as f:
    f.write(json.dumps(audit_record) + "\n")

In a real deployment, log retrieved chunk IDs and model version too. If an examiner asks why the assistant answered a certain way, you need reproducible evidence.

Production Considerations

•
Deployment
- •Keep embeddings and source documents in-region if your data residency rules require it.
- •Separate environments by jurisdiction; don’t mix EU policy indexes with US indexes unless legal has signed off.
•
Monitoring
- •Track retrieval hit rate, refusal rate, top unanswered questions, and hallucination reports.
- •Alert when answers reference stale policies or when retrieval returns low-similarity chunks.
•
Guardrails
- •Redact PII before sending user prompts to the model.
- •Block requests that ask for legal advice outside approved scope or that trigger sanctions/AML escalation paths.
- •Force refusal when no supporting context is found.
•
Auditability
- •Persist prompt version, retrieved sources, answer text, user ID, timestamp, and model name.
- •Make logs immutable where possible; compliance teams care about tamper resistance as much as accuracy.

Common Pitfalls

•
Using generic RAG without policy versioning
- •If old policies stay in the index, the agent will answer from superseded rules.
- •Fix it by tagging every chunk with version, effective_date, and status=approved, then filtering out deprecated content.
•
Letting the model improvise when context is thin
- •In fintech this becomes a compliance incident fast.
- •Fix it with strict system instructions plus an explicit refusal phrase when retrieval doesn’t support an answer.
•
Ignoring jurisdictional differences
- •A single global index often mixes rules from different regulators.
- •Fix it by partitioning indexes or applying metadata filters for region, entity type, product line, and regulatory regime.
•
Skipping audit logs until after launch
- •Then you have no evidence trail when legal asks how an answer was produced.
- •Fix it by logging every query path from day one: input, retrieved docs, output, and model configuration.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit