How to Integrate LangChain for retail banking with PostgreSQL for RAG

By Cyprian AaronsUpdated 2026-04-21
langchain-for-retail-bankingpostgresqlrag

Why this integration matters

Retail banking teams need answers that are grounded in policy, product docs, and customer history. Combining LangChain with PostgreSQL gives you a retrieval layer that can pull the right context fast, then feed it into an agent that can answer questions, summarize cases, or draft next actions with traceable sources.

This setup is useful when you want RAG over internal banking knowledge: fee schedules, KYC rules, dispute procedures, mortgage product docs, and branch operations. PostgreSQL becomes the durable store for embeddings and metadata, while LangChain handles orchestration, retrieval, and response generation.

Prerequisites

  • Python 3.10+
  • PostgreSQL 14+ running locally or in your cloud environment
  • A PostgreSQL database created for your RAG workload
  • psycopg2-binary or psycopg installed
  • langchain, langchain-community, and langchain-openai
  • An OpenAI API key or another LangChain-compatible chat/embedding provider
  • pgvector extension enabled in PostgreSQL
  • Basic retail banking documents ready to ingest:
    • product FAQs
    • policy PDFs converted to text
    • call center scripts
    • internal SOPs

Install the Python packages:

pip install langchain langchain-community langchain-openai psycopg2-binary pgvector

Integration Steps

1) Prepare PostgreSQL for vector storage

Enable pgvector and create a table for your banking documents. LangChain’s PostgreSQL vector store expects a table that can hold embeddings and metadata.

import psycopg2

conn = psycopg2.connect(
    host="localhost",
    port=5432,
    dbname="bank_rag",
    user="postgres",
    password="postgres"
)

with conn.cursor() as cur:
    cur.execute("CREATE EXTENSION IF NOT EXISTS vector;")
    cur.execute("""
        CREATE TABLE IF NOT EXISTS bank_docs (
            id bigserial PRIMARY KEY,
            content text NOT NULL,
            metadata jsonb,
            embedding vector(1536)
        );
    """)
    conn.commit()

conn.close()

If you are using OpenAI embeddings, match the vector size to the embedding model output. For example, text-embedding-3-small uses 1536 dimensions.

2) Load banking documents into LangChain and embed them

Use LangChain document loaders and splitters to normalize policy text before storing it in PostgreSQL.

from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings

docs = [
    Document(
        page_content="Wire transfer fees: domestic outgoing transfers cost $25.",
        metadata={"source": "fees_policy", "product": "retail_banking"}
    ),
    Document(
        page_content="KYC review is required when customer address changes.",
        metadata={"source": "kyc_policy", "product": "retail_banking"}
    ),
]

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectors = embeddings.embed_documents([d.page_content for d in chunks])

At this point you have chunked content plus embeddings ready for persistence.

3) Store vectors in PostgreSQL using LangChain’s PGVector integration

LangChain provides a PostgreSQL-backed vector store via PGVector. This is the core piece that turns Postgres into your retrieval index.

from langchain_postgres import PGVector

connection_string = "postgresql+psycopg://postgres:postgres@localhost:5432/bank_rag"

vector_store = PGVector(
    embeddings=embeddings,
    collection_name="retail_banking_knowledge",
    connection=connection_string,
)

vector_store.add_documents(chunks)

If your LangChain version uses the older community package, the class may be imported from langchain_community.vectorstores.pgvector. The pattern is the same: create the store, then call add_documents().

4) Build a retriever and connect it to a RAG chain

Now wire retrieval into a LangChain prompt + model pipeline. This is where the banking use case becomes production-relevant: the model answers only after pulling supporting context from Postgres.

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

retriever = vector_store.as_retriever(search_kwargs={"k": 3})

prompt = ChatPromptTemplate.from_template("""
You are a retail banking assistant.
Answer using only the context below.
If the answer is not in context, say you don't know.

Context:
{context}

Question:
{question}
""")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

question = "What is the fee for domestic outgoing wire transfers?"
docs = retriever.invoke(question)

messages = prompt.format_messages(
    context=format_docs(docs),
    question=question
)

response = llm.invoke(messages)
print(response.content)

For an agent system, this same retriever can be wrapped as a tool. That lets your assistant decide when to fetch policy context from PostgreSQL before answering a customer-facing question.

5) Add metadata filters for banking-specific retrieval

Retail banking data usually needs filtering by product line, region, or document type. Use metadata to keep retrieval scoped and auditable.

filtered_retriever = vector_store.as_retriever(
    search_kwargs={
        "k": 5,
        "filter": {"product": "retail_banking"}
    }
)

docs = filtered_retriever.invoke("When do we require KYC review?")
for doc in docs:
    print(doc.page_content)
    print(doc.metadata)

This matters when one database holds multiple knowledge domains. You do not want mortgage policy bleeding into deposit-account support answers.

Testing the Integration

Run a simple end-to-end query against your stored retail banking docs.

query = "How much does a domestic outgoing wire transfer cost?"
results = retriever.invoke(query)

print(f"Retrieved {len(results)} documents")
for i, doc in enumerate(results, start=1):
    print(f"\nResult {i}:")
    print(doc.page_content)

Expected output:

Retrieved 1 documents

Result 1:
Wire transfer fees: domestic outgoing transfers cost $25.

If you get zero results, check these first:

  • embedding dimensions match your Postgres vector column
  • documents were actually inserted with add_documents()
  • your connection string points to the right database
  • pgvector extension is enabled

Real-World Use Cases

  • Customer service copilot
    • Answer fee questions, account maintenance rules, card replacement steps, and dispute timelines from approved bank docs.
  • Operations assistant
    • Retrieve SOPs for branch staff or back-office teams when handling KYC updates, wire exceptions, or fraud escalations.
  • Compliance-aware RAG
    • Ground responses in policy text and metadata so auditors can trace which source document informed an answer.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides