How to Integrate Anthropic for banking with pgvector for multi-agent systems

By Cyprian AaronsUpdated 2026-04-21
anthropic-for-bankingpgvectormulti-agent-systems

Combining Anthropic for banking with pgvector gives you a practical pattern for agentic systems that need both reasoning and retrieval. In banking, that usually means one agent handles policy-aware responses while another pulls the right context from embeddings: product docs, KYC procedures, transaction notes, complaint histories, or internal controls.

pgvector gives you durable semantic memory inside Postgres. Anthropic handles the language side: classification, summarization, tool use, and response generation. Put them together and you get multi-agent workflows that can answer with context instead of guessing.

Prerequisites

  • Python 3.10+
  • PostgreSQL 14+ with the pgvector extension installed
  • An Anthropic API key
  • A database user with permission to create tables and extensions
  • psycopg or psycopg2-binary
  • pgvector Python package
  • anthropic Python SDK
  • A vector dimension chosen for your embedding model

Install the Python dependencies:

pip install anthropic psycopg[binary] pgvector

Enable the extension in your database:

CREATE EXTENSION IF NOT EXISTS vector;

Integration Steps

  1. Set up your database schema for agent memory and retrieval.

Use one table for chunked documents and embeddings. For banking workloads, keep metadata explicit so you can filter by document type, jurisdiction, or product line.

import psycopg
from pgvector.psycopg import register_vector

conn = psycopg.connect("postgresql://app_user:password@localhost:5432/banking")
register_vector(conn)

with conn.cursor() as cur:
    cur.execute("CREATE EXTENSION IF NOT EXISTS vector;")
    cur.execute("""
        CREATE TABLE IF NOT EXISTS knowledge_chunks (
            id BIGSERIAL PRIMARY KEY,
            source TEXT NOT NULL,
            doc_type TEXT NOT NULL,
            content TEXT NOT NULL,
            embedding VECTOR(1536)
        );
    """)
    cur.execute("""
        CREATE INDEX IF NOT EXISTS knowledge_chunks_embedding_idx
        ON knowledge_chunks
        USING ivfflat (embedding vector_cosine_ops)
        WITH (lists = 100);
    """)
    conn.commit()
  1. Generate embeddings with Anthropic in your agent pipeline.

Anthropic’s Claude models are used here for reasoning and orchestration. For embeddings, use a dedicated embedding service in your stack if you already have one; the important part is storing vectors in pgvector and using Claude to decide what to retrieve and how to answer. In banking systems, this separation keeps retrieval deterministic and generation auditable.

from anthropic import Anthropic

client = Anthropic(api_key="ANTHROPIC_API_KEY")

def build_query_plan(user_question: str) -> str:
    message = client.messages.create(
        model="claude-3-5-sonnet-latest",
        max_tokens=300,
        messages=[
            {
                "role": "user",
                "content": f"""
You are a banking assistant planner.
Return a short search query and filters for retrieval.

Question: {user_question}
"""
            }
        ],
    )
    return message.content[0].text.strip()
  1. Store vectors in pgvector and retrieve top matches.

This is the retrieval layer your agents will call before generating an answer. Use cosine distance for semantic similarity, then pass the best chunks into Claude as grounded context.

import psycopg
from pgvector.psycopg import register_vector

def save_chunk(conn, source: str, doc_type: str, content: str, embedding: list[float]):
    with conn.cursor() as cur:
        cur.execute(
            """
            INSERT INTO knowledge_chunks (source, doc_type, content, embedding)
            VALUES (%s, %s, %s, %s)
            """,
            (source, doc_type, content, embedding),
        )
    conn.commit()

def search_similar(conn, query_embedding: list[float], limit: int = 5):
    with conn.cursor() as cur:
        cur.execute(
            """
            SELECT source, doc_type, content
            FROM knowledge_chunks
            ORDER BY embedding <=> %s
            LIMIT %s
            """,
            (query_embedding, limit),
        )
        return cur.fetchall()
  1. Wire Anthropic into a multi-agent flow.

A clean pattern is planner → retriever → responder. One agent decides what to search for; another agent uses retrieved chunks to generate the final response with guardrails.

from anthropic import Anthropic

client = Anthropic(api_key="ANTHROPIC_API_KEY")

def answer_banking_question(conn, question: str):
    # In production replace this with your embedding model output.
    # Keep it deterministic per model/version.
    query_embedding = [0.01] * 1536

    rows = search_similar(conn, query_embedding=query_embedding, limit=3)
    context = "\n\n".join(
        f"[{source} | {doc_type}] {content}" for source, doc_type, content in rows
    )

    response = client.messages.create(
        model="claude-3-5-sonnet-latest",
        max_tokens=500,
        messages=[
            {
                "role": "user",
                "content": f"""
You are a banking support agent.
Answer only using the provided context.
If context is insufficient, say what is missing.

Question: {question}

Context:
{context}
"""
            }
        ],
    )
    return response.content[0].text.strip()
  1. Add metadata filters for banking controls.

In regulated environments you rarely want raw similarity only. Filter by document type or region before ranking so one agent cannot pull restricted material from another line of business.

def search_filtered(conn, query_embedding: list[float], doc_type: str):
    with conn.cursor() as cur:
        cur.execute(
            """
            SELECT source, content
            FROM knowledge_chunks
            WHERE doc_type = %s
            ORDER BY embedding <=> %s
            LIMIT 5
            """,
            (doc_type, query_embedding),
        )
        return cur.fetchall()

Testing the Integration

Run a small smoke test with one stored chunk and one question. This verifies Postgres connectivity, vector storage/retrieval, and Claude response generation end to end.

import psycopg
from pgvector.psycopg import register_vector

conn = psycopg.connect("postgresql://app_user:password@localhost:5432/banking")
register_vector(conn)

save_chunk(
    conn,
    source="policy_001",
    doc_type="credit_card_policy",
    content="Chargebacks must be filed within 60 days of the statement date.",
    embedding=[0.01] * 1536,
)

print(answer_banking_question(conn, "What is the chargeback filing window?"))

Expected output:

The chargeback filing window is 60 days from the statement date.

If retrieval works but the answer is vague:

  • Check that your embeddings match the same model dimension used by VECTOR(...)
  • Confirm register_vector(conn) is called before queries involving vectors
  • Verify your prompt includes retrieved context explicitly

Real-World Use Cases

  • Policy-aware customer support

    • One agent retrieves product policy from pgvector-backed docs.
    • Another agent uses Anthropic to answer only from approved material.
  • AML / KYC analyst assistant

    • Store case notes and procedure snippets in pgvector.
    • Use Anthropic to summarize cases and propose next actions grounded in retrieved evidence.
  • Internal ops copilot

    • Multi-agent setup where one agent classifies tickets and another fetches relevant runbooks from Postgres before drafting responses.

This pattern scales because each part has a clear job. pgvector handles memory and similarity search inside your existing Postgres stack. Anthropic handles reasoning over retrieved evidence without turning your database into an application server.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides