How to Build a policy Q&A Agent Using LangChain in Python for retail banking

By Cyprian AaronsUpdated 2026-04-21
policy-q-alangchainpythonretail-bankingpolicy-qanda

A policy Q&A agent for retail banking answers staff or customer questions against approved policy documents, product terms, and compliance playbooks. It matters because most banking support failures are not about missing information; they’re about giving the wrong answer, using outdated policy, or failing to explain the source behind the answer.

Architecture

  • Document ingestion layer

    • Pulls PDFs, HTML policy pages, and internal SOPs from approved sources.
    • Normalizes text and stores metadata like document version, effective date, jurisdiction, and product line.
  • Embedding + vector store

    • Converts policy chunks into embeddings.
    • Stores them in a retriever-backed index such as FAISS, pgvector, or Pinecone depending on your residency and ops constraints.
  • Retriever

    • Fetches only the most relevant policy chunks for each question.
    • Must support metadata filtering for region, customer segment, and product type.
  • LLM answer chain

    • Uses LangChain to combine retrieved context with a strict prompt.
    • Produces concise answers with citations and “I don’t know” behavior when evidence is weak.
  • Guardrails layer

    • Blocks unsupported financial advice, hallucinated policy claims, and requests outside scope.
    • Adds refusal logic for regulated topics that require human review.
  • Audit logging

    • Records question, retrieved documents, model output, timestamps, and versioned prompts.
    • Needed for compliance reviews and incident investigations.

Implementation

1) Load policy documents with metadata

For retail banking, metadata is not optional. You need to know which jurisdiction and product each policy applies to before retrieval happens.

from langchain_community.document_loaders import PyPDFLoader
from langchain_core.documents import Document

loader = PyPDFLoader("retail_banking_policy.pdf")
pages = loader.load()

documents = []
for page in pages:
    documents.append(
        Document(
            page_content=page.page_content,
            metadata={
                "source": "retail_banking_policy.pdf",
                "jurisdiction": "ZA",
                "product": "personal_loans",
                "version": "2025-01",
            },
        )
    )

print(f"Loaded {len(documents)} pages")

2) Chunk and index the policy corpus

Use chunking that preserves meaning. Banking policies often have definitions followed by exceptions; splitting too aggressively breaks retrieval quality.

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

splitter = RecursiveCharacterTextSplitter(
    chunk_size=900,
    chunk_overlap=150,
)

chunks = splitter.split_documents(documents)

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

3) Build a grounded Q&A chain with citations

The pattern below uses ChatPromptTemplate, create_stuff_documents_chain, and create_retrieval_chain. The prompt forces the model to answer only from context and to say when the policy does not support an answer.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are a retail banking policy assistant. "
     "Answer only from the provided context. "
     "If the context does not contain enough information, say 'I don't know based on the current policy.' "
     "Include short citations using source and version metadata."),
    ("human",
     "Question: {input}\n\nContext:\n{context}")
])

document_chain = create_stuff_documents_chain(llm, prompt)
qa_chain = create_retrieval_chain(retriever, document_chain)

response = qa_chain.invoke({
    "input": "Can we waive monthly account fees for customers under hardship?"
})

print(response["answer"])

If you want better control over retrieval in production, filter by jurisdiction or product before calling the retriever. In banking systems that serve multiple regions, this prevents cross-jurisdiction leakage.

filtered_retriever = vectorstore.as_retriever(
    search_kwargs={
        "k": 4,
        "filter": {"jurisdiction": "ZA", "product": "personal_loans"}
    }
)

4) Add a simple guardrail before answering

A banking agent should reject requests that ask for prohibited actions or advice outside policy scope. Keep this logic deterministic; do not rely on the LLM alone.

BLOCKED_PHRASES = [
    "ignore policy",
    "override compliance",
    "guarantee approval",
]

def is_allowed(question: str) -> bool:
    q = question.lower()
    return not any(phrase in q for phrase in BLOCKED_PHRASES)

question = "Can you override compliance and approve this loan?"
if not is_allowed(question):
    print("Request blocked: requires human review.")
else:
    print(qa_chain.invoke({"input": question})["answer"])

Production Considerations

  • Data residency

    • Keep embeddings, vector stores, logs, and model traffic in approved regions.
    • For South African retail banking or other regulated markets, confirm where your hosted LLM sends prompts and whether it stores them.
  • Auditability

    • Log the exact prompt template version, retrieved chunk IDs, document versions, and final answer.
    • When a complaint lands in compliance review, you need to reconstruct why the agent answered what it did.
  • Monitoring

    • Track retrieval hit rate, refusal rate, hallucination reports, and fallback-to-human rate.
    • Alert when answers are generated with low similarity scores or when stale policies are still being retrieved.
  • Human handoff

    • Route ambiguous cases like fee waivers, hardship decisions, fraud disputes, or complaints to a case worker.
    • A good agent knows when not to answer.

Common Pitfalls

  1. Using raw PDFs without metadata

    • Problem: The model retrieves a correct clause from the wrong jurisdiction or expired version.
    • Fix: Attach jurisdiction, product, version, and effective_date during ingestion and filter at query time.
  2. Letting the model answer without grounding

    • Problem: The LLM fills gaps with plausible but non-compliant language.
    • Fix: Use create_retrieval_chain with a strict system prompt that forces “I don’t know” when evidence is missing.
  3. Ignoring document freshness

    • Problem: Old fee schedules or lending rules stay indexed after policy changes.
    • Fix: Rebuild indexes on policy updates and include document versioning in every citation so reviewers can verify what was used.
  4. Skipping escalation paths

    • Problem: The agent tries to resolve complaints or exceptions that require human judgment.
    • Fix: Add deterministic routing rules for regulated scenarios before the LLM runs. In retail banking, escalation is part of correctness.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides