How to Build a policy Q&A Agent Using LangChain in Python for banking

By Cyprian AaronsUpdated 2026-04-21

policy-q-alangchainpythonbankingpolicy-qanda

A policy Q&A agent for banking answers questions like “Can this customer waive the overdraft fee?” or “What is the retention period for KYC documents?” by retrieving the right internal policy, grounding the answer in that policy, and refusing to guess when the policy is missing. That matters because banking teams need fast answers without drifting into non-compliant advice, and every response needs an audit trail.

Architecture

•
Policy document store
- •Source of truth for PDFs, markdown, Confluence exports, or SharePoint dumps.
- •In banking, keep this in a controlled repository with document versioning and retention metadata.
•
Text ingestion and chunking
- •Split policies into small retrievable chunks using RecursiveCharacterTextSplitter.
- •Preserve section headers and policy IDs so you can trace every answer back to a clause.
•
Embedding + vector index
- •Convert chunks into embeddings and store them in a vector DB such as FAISS.
- •For regulated environments, choose storage that meets your data residency requirements.
•
Retriever
- •Use a retriever from the vector store to fetch only relevant policy passages.
- •Tune k carefully so you do not flood the model with irrelevant text.
•
LLM answer chain
- •Use LangChain’s RetrievalQA or a create_retrieval_chain pattern with a strict prompt.
- •Force citations and refusal behavior when the retrieved context does not support an answer.
•
Audit logging layer
- •Log user question, retrieved chunk IDs, model output, timestamp, and policy version.
- •This is non-negotiable in banking if you need defensible decision traces.

Implementation

•Install dependencies and load your policy corpus

Use LangChain’s current split between community integrations and core abstractions. For a local prototype, FAISS plus OpenAI embeddings works well; in production you can swap the vector store without changing the retrieval pattern.

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

# Load one or more policy PDFs
loader = PyPDFLoader("bank_policy_manual.pdf")
documents = loader.load()

# Split into retrievable chunks
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150,
)
chunks = splitter.split_documents(documents)

# Embed and index
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

•Build a strict prompt that forces grounded answers

For banking use cases, the prompt should tell the model to answer only from retrieved context, cite section names if available, and say “I don’t know” when the evidence is missing. That reduces hallucinations and gives compliance teams something they can review.

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are a banking policy assistant. "
     "Answer only using the provided context. "
     "If the context does not contain enough information, say: "
     "\"I don't know based on the provided policy documents.\" "
     "Include short citations to policy sections when possible."),
    ("human", "Question: {input}\n\nContext:\n{context}")
])

•Create the retrieval QA chain

The simplest production-shaped pattern in LangChain is a retriever feeding a document-combining chain. The code below uses create_stuff_documents_chain plus create_retrieval_chain, which is clearer than older monolithic wrappers.

from langchain_openai import ChatOpenAI
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

document_chain = create_stuff_documents_chain(llm, prompt)
qa_chain = create_retrieval_chain(retriever, document_chain)

question = "Can we waive overdraft fees for a customer impacted by fraud?"
result = qa_chain.invoke({"input": question})

print("Answer:")
print(result["answer"])
print("\nRetrieved docs:")
for doc in result["context"]:
    print(doc.metadata.get("source"), doc.metadata.get("page"))

•Add audit logging around every query

Do not treat this as optional glue code. In banking you want to persist what was asked, what was retrieved, which model answered it, and which document version was used.

import json
from datetime import datetime

def ask_policy(question: str):
    result = qa_chain.invoke({"input": question})
    audit_record = {
        "timestamp": datetime.utcnow().isoformat(),
        "question": question,
        "answer": result["answer"],
        "sources": [
            {
                "source": doc.metadata.get("source"),
                "page": doc.metadata.get("page"),
            }
            for doc in result["context"]
        ],
    }

    with open("policy_audit_log.jsonl", "a", encoding="utf-8") as f:
        f.write(json.dumps(audit_record) + "\n")

    return result["answer"]

print(ask_policy("What is our AML escalation threshold for suspicious activity?"))

Production Considerations

•
Data residency
- •Keep embeddings, vector indexes, and logs in-region if your bank has jurisdictional constraints.
- •If policies are sensitive, avoid sending raw documents to external services unless your legal team has approved it.
•
Monitoring
- •Track retrieval hit rate, unanswered questions, citation coverage, and fallback frequency.
- •If users ask about topics outside corpus coverage, that usually means your ingestion pipeline is stale.
•
Guardrails
- •Add refusal logic for legal advice, customer-specific decisions, or anything requiring human approval.
- •In practice, route those queries to compliance or operations instead of forcing an LLM response.
•
Version control
- •Tag every indexed chunk with policy version and effective date.
- •When policies change, rebuild the index or at least invalidate stale chunks.

Common Pitfalls

•
Using generic RAG without strict grounding
- •Mistake: letting the model answer from memory when retrieval returns weak context.
- •Avoid it by forcing “I don’t know” behavior in the system prompt and rejecting low-confidence outputs.
•
Ignoring document metadata
- •Mistake: indexing text without source file name, page number, section ID, or effective date.
- •Avoid it by preserving metadata through PyPDFLoader, splitting, indexing, and response logging.
•
Treating compliance as an afterthought
- •Mistake: shipping a helpful Q&A bot that cannot explain where its answer came from.
- •Avoid it by logging every query-response pair and requiring citations for any policy interpretation.

A banking policy Q&A agent is not just a chatbot with retrieval bolted on. It is an internal control surface: fast enough for operations teams, strict enough for compliance reviewers, and traceable enough for auditors.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit