How to Integrate Anthropic for retail banking with pgvector for multi-agent systems
Why this integration matters
If you’re building retail banking agents, the hard part is not calling a model. It’s giving the model access to the right customer context without turning your app into a compliance risk. Pairing Anthropic with pgvector gives you a clean pattern: use Anthropic for reasoning and response generation, and pgvector for retrieving policy docs, product terms, account notes, and prior case history from a vector store.
That combination unlocks multi-agent systems that can answer customer questions, triage service requests, and route sensitive cases with grounded context instead of free-form guesses.
Prerequisites
- •Python 3.10+
- •Access to an Anthropic API key
- •PostgreSQL 14+ with the
pgvectorextension installed - •A database user with permission to create tables and extensions
- •
pipinstalled - •Basic familiarity with embeddings and retrieval augmented generation
- •A retail banking knowledge base to index:
- •FAQs
- •product terms
- •KYC/AML policy snippets
- •support runbooks
- •escalation playbooks
Install dependencies:
pip install anthropic psycopg[binary] pgvector python-dotenv
Integration Steps
1) Set up PostgreSQL with pgvector
Create the extension and a table for document chunks. Store embeddings as vector(1536) or whatever size matches your embedding model.
import os
import psycopg
DB_URL = os.getenv("DATABASE_URL")
with psycopg.connect(DB_URL) as conn:
with conn.cursor() as cur:
cur.execute("CREATE EXTENSION IF NOT EXISTS vector;")
cur.execute("""
CREATE TABLE IF NOT EXISTS bank_knowledge (
id SERIAL PRIMARY KEY,
doc_type TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536)
);
""")
conn.commit()
print("pgvector schema ready")
If you already have tables, just make sure the vector extension exists and the embedding dimension matches your model output.
2) Generate embeddings and store banking content in pgvector
Use an embedding model to convert each chunk into a vector before inserting it. In production, chunk by semantic boundaries: product sections, policy clauses, or FAQ answers.
import os
import anthropic
import psycopg
from pgvector.psycopg import register_vector
client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
DB_URL = os.getenv("DATABASE_URL")
# Replace this with your actual embedding pipeline/model.
def embed_text(text: str) -> list[float]:
# Example placeholder: call your embedding service here.
# Return a 1536-dim vector.
raise NotImplementedError("Connect your embedding model here")
docs = [
("faq", "Debit card replacement takes 3 to 5 business days."),
("policy", "Suspicious activity must be escalated to AML review within 1 hour."),
]
with psycopg.connect(DB_URL) as conn:
register_vector(conn)
with conn.cursor() as cur:
for doc_type, content in docs:
embedding = embed_text(content)
cur.execute(
"INSERT INTO bank_knowledge (doc_type, content, embedding) VALUES (%s, %s, %s)",
(doc_type, content, embedding),
)
conn.commit()
print("Inserted banking knowledge into pgvector")
The key point here is that pgvector is just storage plus similarity search. Your retrieval quality depends on how well you chunk and embed the source material.
3) Retrieve relevant context for the agent
At runtime, retrieve the top-k most similar chunks for the user’s question. That gives Anthropic grounded context for answering banking queries.
import os
import psycopg
from pgvector.psycopg import register_vector
DB_URL = os.getenv("DATABASE_URL")
def embed_text(text: str) -> list[float]:
raise NotImplementedError("Use your embedding model")
def retrieve_context(query: str, k: int = 3) -> list[str]:
query_embedding = embed_text(query)
sql = """
SELECT content
FROM bank_knowledge
ORDER BY embedding <=> %s::vector
LIMIT %s;
"""
with psycopg.connect(DB_URL) as conn:
register_vector(conn)
with conn.cursor() as cur:
cur.execute(sql, (query_embedding, k))
rows = cur.fetchall()
return [row[0] for row in rows]
context_chunks = retrieve_context("How long does debit card replacement take?")
print(context_chunks)
The <=> operator is cosine distance in pgvector. For retail banking use cases, this is usually the simplest starting point because it works well for semantic retrieval across policy text and customer support content.
4) Call Anthropic with retrieved banking context
Now pass the retrieved chunks into Anthropic’s Messages API. Keep the system prompt strict: answer only from provided context when dealing with regulated banking information.
import os
import anthropic
client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
def answer_banking_question(question: str) -> str:
context_chunks = retrieve_context(question)
context_block = "\n\n".join(
f"- {chunk}" for chunk in context_chunks if chunk.strip()
)
response = client.messages.create(
model="claude-3-5-sonnet-latest",
max_tokens=300,
temperature=0,
system=(
"You are a retail banking assistant. "
"Use only the provided context for factual answers. "
"If the context is insufficient, say you need escalation."
),
messages=[
{
"role": "user",
"content": f"""
Question: {question}
Context:
{context_block}
"""
}
],
)
return response.content[0].text
print(answer_banking_question("How long does debit card replacement take?"))
This is where multi-agent systems start making sense. One agent can do retrieval over product and policy data; another can draft the response; another can decide whether to escalate based on confidence or policy triggers.
5) Add a simple multi-agent router
For production banking workflows, split responsibilities instead of putting everything behind one prompt. A router agent can classify intent before retrieval happens.
import anthropic
client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
def route_intent(question: str) -> str:
resp = client.messages.create(
model="claude-3-5-sonnet-latest",
max_tokens=50,
temperature=0,
messages=[
{
"role": "user",
"content": f"""
Classify this retail banking request as one of:
- faq
- fraud
- kyc
- payments
- escalation
Request: {question}
Return only one label.
"""
}
],
)
return resp.content[0].text.strip().lower()
intent = route_intent("My card was declined overseas")
print(intent)
You can then route fraud or escalation intents to stricter prompts, different knowledge bases, or human review queues.
Testing the Integration
Use a known query that should match one of your stored documents. The test should confirm two things: retrieval returns relevant context and Anthropic produces an answer grounded in that context.
question = "How long does debit card replacement take?"
answer = answer_banking_question(question)
print("QUESTION:", question)
print("ANSWER:", answer)
Expected output:
QUESTION: How long does debit card replacement take?
ANSWER: Debit card replacement takes 3 to 5 business days.
If you get vague answers like “please contact support,” check these first:
- •Your embeddings are being generated consistently.
- •The vector dimension matches the table definition.
- •Your similarity query returns relevant rows.
- •Your system prompt forbids unsupported claims.
Real-World Use Cases
- •
Retail banking customer service agent
- •Answer product questions from indexed FAQs and policy docs.
- •Escalate fraud or complaint cases when retrieval confidence is low.
- •
Branch or call-center copilot
- •Pull relevant internal procedures before generating responses.
- •Help staff handle account servicing, disputes, and card issues faster.
- •
Multi-agent compliance workflow
- •One agent retrieves AML/KYC policy snippets from pgvector.
- •Another agent uses Anthropic to draft a compliant next action or escalation note.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit