How to Integrate Anthropic for investment banking with pgvector for startups

By Cyprian AaronsUpdated 2026-04-21
anthropic-for-investment-bankingpgvectorstartups

Combining Anthropic with pgvector gives you a practical pattern for startup banking workflows: use Anthropic to reason over financial documents, deal notes, and analyst questions, then use pgvector to retrieve the most relevant filings, memos, and prior conversations from your own knowledge base.

For investment banking teams, that means faster Q&A over pitch books, CIMs, diligence packs, and market research without stuffing everything into the prompt. For startups building AI agents, it means you can keep context grounded in your private data instead of relying on model memory alone.

Prerequisites

  • Python 3.10+
  • An Anthropic API key
  • PostgreSQL 14+ with the pgvector extension installed
  • A running PostgreSQL instance you can connect to
  • pip access to install:
    • anthropic
    • psycopg[binary]
    • pgvector
  • A schema or database where you can create tables
  • Basic familiarity with embeddings and vector similarity search

Install the dependencies:

pip install anthropic psycopg[binary] pgvector

Integration Steps

  1. Set up PostgreSQL with pgvector

Create the extension and a table for storing documents plus embeddings. Use a fixed embedding dimension that matches the model you choose.

import psycopg
from pgvector.psycopg import register_vector

conn = psycopg.connect("postgresql://postgres:postgres@localhost:5432/finance")
register_vector(conn)

with conn.cursor() as cur:
    cur.execute("CREATE EXTENSION IF NOT EXISTS vector;")
    cur.execute("""
        CREATE TABLE IF NOT EXISTS banking_docs (
            id SERIAL PRIMARY KEY,
            title TEXT NOT NULL,
            content TEXT NOT NULL,
            embedding VECTOR(1536)
        );
    """)
    conn.commit()

print("pgvector table ready")
  1. Generate embeddings with Anthropic-compatible text inputs

Anthropic’s Claude models are used for reasoning and extraction. For embeddings, keep the pipeline explicit: chunk your text, then send chunks through an embedding provider. In production, many teams pair Claude with a dedicated embedding model while keeping Anthropic for orchestration and analysis.

Below is the retrieval side using a placeholder embedding function so the rest of the integration stays real and testable.

from typing import List
import hashlib
import numpy as np

def embed_text(text: str) -> List[float]:
    # Replace this with your production embedding provider.
    # The shape must match VECTOR(1536).
    digest = hashlib.sha256(text.encode()).digest()
    values = np.frombuffer(digest * 48, dtype=np.uint8)[:1536]
    return (values / 255.0).astype(float).tolist()

sample_text = "Company valuation increased after ARR growth and improved retention."
embedding = embed_text(sample_text)
print(len(embedding))
  1. Store banking documents in pgvector

Insert your source material into PostgreSQL with embeddings attached. This is where pitch decks, earnings call summaries, or diligence notes become searchable by semantic similarity.

import psycopg
from pgvector.psycopg import register_vector

docs = [
    ("Q3 Board Memo", "Revenue grew 28% YoY with strong enterprise expansion."),
    ("Credit Committee Note", "Debt service coverage remains above threshold."),
]

conn = psycopg.connect("postgresql://postgres:postgres@localhost:5432/finance")
register_vector(conn)

with conn.cursor() as cur:
    for title, content in docs:
        emb = embed_text(content)
        cur.execute(
            """
            INSERT INTO banking_docs (title, content, embedding)
            VALUES (%s, %s, %s)
            """,
            (title, content, emb),
        )
    conn.commit()

print("documents inserted")
  1. Query pgvector for relevant context

At runtime, embed the user question, retrieve top matches from PostgreSQL, then pass those results into Anthropic as grounded context.

import psycopg
from pgvector.psycopg import register_vector

query = "What does our debt position look like relative to cash flow?"

conn = psycopg.connect("postgresql://postgres:postgres@localhost:5432/finance")
register_vector(conn)

query_emb = embed_text(query)

with conn.cursor() as cur:
    cur.execute(
        """
        SELECT title, content
        FROM banking_docs
        ORDER BY embedding <-> %s
        LIMIT 3;
        """,
        (query_emb,),
    )
    rows = cur.fetchall()

for row in rows:
    print(row)
  1. Use Anthropic to answer with retrieved evidence

Now call Claude through the Anthropic SDK using messages.create. Feed in the retrieved chunks so the model answers from your bank-specific corpus instead of guessing.

from anthropic import Anthropic

client = Anthropic(api_key="YOUR_ANTHROPIC_API_KEY")

context = "\n\n".join([f"Title: {title}\nContent: {content}" for title, content in rows])

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=400,
    temperature=0,
    messages=[
        {
            "role": "user",
            "content": f"""
You are an investment banking analyst assistant.
Use only the context below to answer the question.

Context:
{context}

Question:
{query}
"""
        }
    ],
)

print(response.content[0].text)

Testing the Integration

Run a simple end-to-end check: insert one document about liquidity risk, search for a related question, then ask Claude to summarize it.

test_question = "Are we at risk on liquidity?"

query_emb = embed_text(test_question)

with conn.cursor() as cur:
    cur.execute(
        """
        SELECT title, content
        FROM banking_docs
        ORDER BY embedding <-> %s
        LIMIT 1;
        """,
        (query_emb,),
    )
    top_doc = cur.fetchone()

result = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=200,
    temperature=0,
    messages=[
        {
            "role": "user",
            "content": f"""
Context:
Title: {top_doc[0]}
Content: {top_doc[1]}

Question:
{test_question}
"""
        }
    ],
)

print(result.content[0].text)

Expected output:

Based on the retrieved note, liquidity does not appear immediately constrained. The document indicates debt service coverage remains above threshold and cash flow support is still healthy.

Real-World Use Cases

  • Diligence copilot

    • Search CIMs, contracts, board memos, and Q&A logs with pgvector.
    • Use Claude to summarize risks, extract covenants, and draft banker-ready responses.
  • Investment memo assistant

    • Retrieve comparable deals and internal research by semantic similarity.
    • Have Claude generate first-pass investment committee summaries grounded in your archive.
  • Founder finance agent

    • Let startup operators ask natural-language questions like “What changed in burn last quarter?”
    • Retrieve source docs from Postgres and have Claude explain the answer in plain English without hallucinating.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides