How to Integrate LlamaIndex for pension funds with Supabase for RAG
Pension fund teams deal with long, messy document sets: investment policy statements, actuarial reports, trustee minutes, benefit rules, and regulatory filings. Combining LlamaIndex for pension funds with Supabase gives you a clean RAG stack: LlamaIndex handles ingestion, chunking, retrieval, and query orchestration; Supabase gives you Postgres storage plus pgvector for persistent embeddings and metadata filtering.
Prerequisites
- •Python 3.10+
- •A Supabase project with:
- •
SUPABASE_URL - •
SUPABASE_SERVICE_ROLE_KEYor anon key for local testing - •
pgvectorenabled
- •
- •A working LLM provider key, for example:
- •
OPENAI_API_KEY
- •
- •Installed packages:
- •
llama-index - •
llama-index-vector-stores-supabase - •
supabase - •
python-dotenv
- •
- •Pension fund documents ready to ingest:
- •PDFs
- •DOCX
- •TXT
- •A local
.envfile configured with your secrets
Integration Steps
- •Install the dependencies and wire environment variables.
pip install llama-index llama-index-vector-stores-supabase supabase python-dotenv
from dotenv import load_dotenv
import os
load_dotenv()
SUPABASE_URL = os.environ["SUPABASE_URL"]
SUPABASE_SERVICE_ROLE_KEY = os.environ["SUPABASE_SERVICE_ROLE_KEY"]
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
- •Create a Supabase vector table for pension fund documents.
Use a table that stores embeddings plus metadata like document type, fund name, jurisdiction, and effective date. That metadata is what makes RAG useful in regulated environments.
from supabase import create_client
supabase = create_client(SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY)
sql = """
create extension if not exists vector;
create table if not exists pension_fund_chunks (
id bigserial primary key,
content text,
metadata jsonb,
embedding vector(1536)
);
"""
# Run this in the Supabase SQL editor or via RPC/migration tooling.
print("Create the table in Supabase before ingesting data.")
- •Build the LlamaIndex pipeline and connect it to Supabase.
This example uses LlamaIndex’s SupabaseVectorStore and standard index construction. For pension fund use cases, keep metadata attached at ingestion so you can filter by fund, year, or document class later.
from llama_index.core import VectorStoreIndex, StorageContext, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.vector_stores.supabase import SupabaseVectorStore
from llama_index.embeddings.openai import OpenAIEmbedding
vector_store = SupabaseVectorStore(
postgres_connection_string=f"postgresql://postgres:{SUPABASE_SERVICE_ROLE_KEY}@db.{SUPABASE_URL.replace('https://', '')}:5432/postgres",
collection_name="pension_fund_chunks",
)
embed_model = OpenAIEmbedding(model="text-embedding-3-small")
documents = SimpleDirectoryReader("./pension_docs").load_data()
splitter = SentenceSplitter(chunk_size=800, chunk_overlap=120)
nodes = splitter.get_nodes_from_documents(documents)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(
nodes,
storage_context=storage_context,
embed_model=embed_model,
)
- •Add pension-fund-specific metadata before indexing.
This is the part most teams skip. If you want answers like “show only UK trustee minutes from Q4 2024,” your chunks need structured metadata from day one.
from llama_index.core.schema import TextNode
nodes_with_metadata = []
for node in nodes:
node.metadata.update({
"fund_name": "Northwind Pension Fund",
"doc_type": "trustee_minutes",
"jurisdiction": "UK",
"year": 2024,
})
nodes_with_metadata.append(node)
index = VectorStoreIndex(
nodes_with_metadata,
storage_context=storage_context,
embed_model=embed_model,
)
- •Query the index through a retriever or query engine.
For production agents, I prefer a retriever-first pattern because it gives you more control over citations, filters, and downstream tool routing.
query_engine = index.as_query_engine(similarity_top_k=3)
response = query_engine.query(
"What did the trustees decide about rebalancing in Q4 2024?"
)
print(response)
Testing the Integration
Run a smoke test against one known document set and verify that retrieval returns relevant chunks with source context.
test_query = "Summarize the funding ratio discussion from the latest actuarial report."
response = index.as_query_engine(similarity_top_k=2).query(test_query)
print("ANSWER:")
print(response)
Expected output:
ANSWER:
The actuarial report states that the funding ratio improved after asset performance recovered...
Source: actuarial_report_2024_q3.pdf
Source: trustee_pack_august_2024.pdf
If you get generic answers or no sources:
- •Check that embeddings were inserted into Supabase
- •Confirm your chunk size is not too large
- •Verify the table name matches
collection_name - •Make sure your metadata survived ingestion
Real-World Use Cases
- •Trustee meeting assistant that answers questions from minutes, papers, and actuarial reports with citations.
- •Regulatory compliance agent that searches policy documents by jurisdiction and reporting period.
- •Member-services copilot that retrieves benefit-rule explanations from approved pension documentation only.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit