How to Integrate Anthropic for banking with pgvector for RAG

By Cyprian AaronsUpdated 2026-04-21

anthropic-for-bankingpgvectorrag

Combining Anthropic for banking with pgvector gives you a practical RAG stack for regulated workflows: retrieve the right internal policy, account note, or product rule from PostgreSQL, then have Claude answer with grounded context. The result is an agent that can handle customer support, policy lookup, and advisor assistance without dumping your entire knowledge base into the prompt.

Prerequisites

•Python 3.10+
•A PostgreSQL 14+ database
•The pgvector extension installed in Postgres
•An Anthropic API key
•
A banking-safe document corpus to index:
- •policy PDFs
- •product docs
- •call-center playbooks
- •compliance procedures
•
Python packages:
- •anthropic
- •psycopg[binary]
- •pgvector
- •python-dotenv

Install dependencies:

pip install anthropic psycopg[binary] pgvector python-dotenv

Enable pgvector in your database:

CREATE EXTENSION IF NOT EXISTS vector;

Integration Steps

•Create the vector table in PostgreSQL

You need one table for documents and embeddings. For Anthropic embeddings, use a separate embedding provider if you already have one in your stack; the retrieval layer is what matters here.

import os
import psycopg
from pgvector.psycopg import register_vector

DB_URL = os.getenv("DATABASE_URL")

with psycopg.connect(DB_URL) as conn:
    register_vector(conn)
    with conn.cursor() as cur:
        cur.execute("""
            CREATE TABLE IF NOT EXISTS banking_docs (
                id BIGSERIAL PRIMARY KEY,
                source ტექst TEXT NOT NULL,
                chunk_text TEXT NOT NULL,
                embedding VECTOR(1536) NOT NULL,
                metadata JSONB DEFAULT '{}'::jsonb
            );
        """)
        cur.execute("""
            CREATE INDEX IF NOT EXISTS banking_docs_embedding_idx
            ON banking_docs USING ivfflat (embedding vector_cosine_ops)
            WITH (lists = 100);
        """)
        conn.commit()

•Generate embeddings and store chunks

Chunk your documents before insertion. In production, use a deterministic chunker so retrieval stays stable across re-indexes.

import os
import json
import psycopg
from pgvector.psycopg import register_vector

def fake_embed(text: str) -> list[float]:
    # Replace with your embedding provider.
    return [0.0] * 1536

docs = [
    {
        "source": "mortgage_policy_v3",
        "chunk_text": "Customers must verify identity before discussing mortgage rates.",
        "metadata": {"department": "retail-banking", "doc_type": "policy"}
    },
    {
        "source": "fraud_playbook",
        "chunk_text": "If a cardholder reports unauthorized activity, freeze the card immediately.",
        "metadata": {"department": "fraud", "doc_type": "playbook"}
    }
]

with psycopg.connect(os.getenv("DATABASE_URL")) as conn:
    register_vector(conn)
    with conn.cursor() as cur:
        for doc in docs:
            emb = fake_embed(doc["chunk_text"])
            cur.execute(
                """
                INSERT INTO banking_docs (source, chunk_text, embedding, metadata)
                VALUES (%s, %s, %s, %s)
                """,
                (doc["source"], doc["chunk_text"], emb, json.dumps(doc["metadata"]))
            )
        conn.commit()

•Retrieve relevant context from pgvector

At query time, embed the user question and fetch the nearest chunks.

import os
import psycopg
from pgvector.psycopg import register_vector

def fake_embed(text: str) -> list[float]:
    return [0.0] * 1536

question = "What should I do if a customer reports unauthorized card activity?"

with psycopg.connect(os.getenv("DATABASE_URL")) as conn:
    register_vector(conn)
    with conn.cursor() as cur:
        qvec = fake_embed(question)
        cur.execute(
            """
            SELECT source, chunk_text, metadata,
                   1 - (embedding <=> %s) AS similarity
            FROM banking_docs
            ORDER BY embedding <=> %s
            LIMIT 3;
            """,
            (qvec, qvec)
        )
        rows = cur.fetchall()

context = "\n".join([f"[{r[0]}] {r[1]}" for r in rows])
print(context)

•Call Anthropic with retrieved context

Use the Anthropic Messages API to ground the response in retrieved banking content.

import os
from anthropic import Anthropic

client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

question = "What should I do if a customer reports unauthorized card activity?"
context = """
[fraud_playbook] If a cardholder reports unauthorized activity, freeze the card immediately.
[mortgage_policy_v3] Customers must verify identity before discussing mortgage rates.
"""

response = client.messages.create(
    model="claude-3-5-sonnet-latest",
    max_tokens=300,
    temperature=0,
    system="You are a banking assistant. Answer only using the provided context. If missing data, say you don't know.",
    messages=[
        {
            "role": "user",
            "content": f"Context:\n{context}\n\nQuestion: {question}"
        }
    ]
)

print(response.content[0].text)

•Wrap retrieval + generation into one agent function

This is the pattern you actually ship: retrieve top-k chunks, pass them into Claude, return a grounded answer.

import os
import psycopg
from anthropic import Anthropic
from pgvector.psycopg import register_vector

client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

def embed_query(text: str) -> list[float]:
    return [0.0] * 1536

def answer_banking_question(question: str) -> str:
    with psycopg.connect(os.getenv("DATABASE_URL")) as conn:
        register_vector(conn)
        with conn.cursor() as cur:
            qvec = embed_query(question)
            cur.execute(
                """
                SELECT chunk_text
                FROM banking_docs
                ORDER BY embedding <=> %s
                LIMIT 4;
                """,
                (qvec,)
            )
            chunks = [row[0] for row in cur.fetchall()]

    context = "\n".join(chunks)

    resp = client.messages.create(
        model="claude-3-5-sonnet-latest",
        max_tokens=250,
        temperature=0,
        system="Answer using only retrieved bank policy context.",
        messages=[{
            "role": "user",
            "content": f"Retrieved context:\n{context}\n\nQuestion: {question}"
        }]
    )

    return resp.content[0].text

print(answer_banking_question("How do we handle unauthorized card activity?"))

Testing the Integration

Run a simple end-to-end test against known policy text.

result = answer_banking_question("What happens when a customer reports unauthorized card activity?")
print(result)

Expected output:

Freeze the card immediately and follow fraud escalation procedures. If additional verification is needed, confirm identity before disclosing account details.

If you get an off-policy answer, check these first:

•Your retrieval quality is weak because embeddings are bad or chunks are too large.
•Your prompt allows Claude to answer without context.
•Your top-k is too small.
•Your database index is missing or not being used.

Real-World Use Cases

•
Call center copilot
- •Pull relevant policy snippets from Postgres and generate compliant answers for agents handling disputes, card freezes, or mortgage questions.
•
Advisor knowledge assistant
- •Retrieve product rules, suitability notes, and fee schedules so advisors can answer client questions without searching multiple systems.
•
Internal compliance Q&A
- •Let compliance teams ask natural-language questions over policies and procedures while keeping source-of-truth documents in PostgreSQL.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit