How to Integrate LlamaIndex for investment banking with Supabase for AI agents
Combining LlamaIndex for investment banking with Supabase gives you a practical pattern for building agent systems that can search deal docs, answer diligence questions, and persist conversational state in a real database. LlamaIndex handles retrieval and reasoning over private banking content; Supabase gives you Postgres storage, auth, and a clean way to keep agent memory, audit logs, and user context.
Prerequisites
- •Python 3.10+
- •A Supabase project with:
- •
SUPABASE_URL - •
SUPABASE_SERVICE_ROLE_KEYor anon key for client-side testing
- •
- •A Postgres database enabled in Supabase
- •LlamaIndex installed with the packages you need for your backend:
- •
llama-index - •
llama-index-vector-stores-supabase - •
llama-index-embeddings-openaior another embedding provider - •
llama-index-llms-openaior another LLM provider
- •
- •OpenAI API key if you use OpenAI embeddings/LLM
- •A table strategy for agent state, such as:
- •chat sessions
- •retrieved sources
- •tool calls
- •compliance audit events
Install the core dependencies:
pip install llama-index supabase python-dotenv
pip install llama-index-vector-stores-supabase llama-index-embeddings-openai llama-index-llms-openai
Integration Steps
1) Connect to Supabase from Python
Start by creating a Supabase client. In production, keep the service role key on the server only.
import os
from supabase import create_client, Client
SUPABASE_URL = os.environ["SUPABASE_URL"]
SUPABASE_SERVICE_ROLE_KEY = os.environ["SUPABASE_SERVICE_ROLE_KEY"]
supabase: Client = create_client(SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY)
# Quick sanity check: fetch rows from a table you created
response = supabase.table("agent_sessions").select("*").limit(1).execute()
print(response.data)
If this fails, fix your environment variables first. Don’t move on until the connection is stable.
2) Configure LlamaIndex for investment banking documents
For banking workflows, you usually want RAG over PDFs, pitch decks, CIMs, earnings transcripts, and internal notes. Use LlamaIndex to load and index those documents.
import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
Settings.embed_model = OpenAIEmbedding(api_key=os.environ["OPENAI_API_KEY"])
Settings.llm = OpenAI(model="gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"])
documents = SimpleDirectoryReader("./banking_docs").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("Summarize revenue drivers and key risks in the target company.")
print(response)
This gives your agent a retrieval layer over investment banking content. The important part is that the model answers from your internal corpus instead of hallucinating from generic finance knowledge.
3) Store embeddings in Supabase using LlamaIndex’s vector store integration
If you want persistence outside local memory, use Supabase as your vector store. This is the cleanest path when multiple agents need shared retrieval across sessions.
import os
from llama_index.core import StorageContext, VectorStoreIndex, Document
from llama_index.vector_stores.supabase import SupabaseVectorStore
vector_store = SupabaseVectorStore(
postgres_connection_string=os.environ["SUPABASE_POSTGRES_CONNECTION_STRING"],
collection_name="investment_banking_docs",
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
docs = [
Document(text="Target company had $120M revenue in FY2024 with 18% EBITDA margin."),
Document(text="Primary risks include customer concentration and refinancing pressure."),
]
index = VectorStoreIndex.from_documents(docs, storage_context=storage_context)
query_engine = index.as_query_engine()
result = query_engine.query("What are the main risks?")
print(result)
Use a dedicated collection per domain or client if you need isolation. For regulated environments, this matters more than fancy prompt engineering.
4) Persist agent memory and audit data in Supabase tables
RAG alone is not enough for an agent system. You also need session history and traceability so analysts can inspect what happened during a run.
from datetime import datetime
def log_agent_event(session_id: str, user_query: str, answer: str):
payload = {
"session_id": session_id,
"user_query": user_query,
"answer": answer,
"created_at": datetime.utcnow().isoformat(),
}
return supabase.table("agent_audit_log").insert(payload).execute()
session_id = "deal-team-001"
question = "What are the key diligence issues?"
answer = str(query_engine.query(question))
log_response = log_agent_event(session_id, question, answer)
print(log_response.data)
This pattern gives you an audit trail for compliance reviews and postmortems. In banking workflows, that’s not optional.
5) Wire both systems into one agent flow
Now connect retrieval + persistence into one callable function your app can use.
def answer_investment_banking_question(session_id: str, question: str):
response = query_engine.query(question)
answer_text = str(response)
supabase.table("agent_sessions").upsert({
"session_id": session_id,
"last_question": question,
"last_answer": answer_text,
}).execute()
supabase.table("agent_audit_log").insert({
"session_id": session_id,
"user_query": question,
"answer": answer_text,
}).execute()
return answer_text
print(answer_investment_banking_question(
"session-123",
"Draft a summary of valuation concerns for this acquisition target."
))
At this point you have a working agent loop:
- •retrieve from indexed banking content via LlamaIndex
- •persist state and logs in Supabase
- •return grounded answers to the caller
Testing the Integration
Run a basic end-to-end test that checks retrieval plus database writes.
test_session_id = "test-session"
test_question = "What are the top risks mentioned in the deal materials?"
test_answer = answer_investment_banking_question(test_session_id, test_question)
print("ANSWER:", test_answer)
check = supabase.table("agent_audit_log") \
.select("*") \
.eq("session_id", test_session_id) \
.order("created_at", desc=True) \
.limit(1) \
.execute()
print("DB ROW:", check.data[0])
Expected output:
ANSWER: The main risks are customer concentration, refinancing pressure, and margin compression.
DB ROW: {
'session_id': 'test-session',
'user_query': 'What are the top risks mentioned in the deal materials?',
'answer': 'The main risks are customer concentration...',
'created_at': '2026-04-21T12:34:56.000000'
}
If retrieval works but inserts fail:
- •check your table names
- •verify RLS policies in Supabase
- •confirm whether you’re using anon vs service role keys
Real-World Use Cases
- •
Deal diligence copilot
- •Let analysts ask questions across CIMs, financial models, board decks, and transcripts.
- •Store every Q&A pair in Supabase for review and compliance.
- •
Investment memo generator
- •Use LlamaIndex to pull evidence from source docs.
- •Persist draft sections and approval states in Supabase so multiple bankers can collaborate.
- •
Banking knowledge assistant
- •Build an internal chat assistant over prior deals and market notes.
- •Keep user sessions, citations, and audit logs in Supabase for traceability.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit