How to Integrate LlamaIndex for pension funds with Supabase for AI agents

By Cyprian AaronsUpdated 2026-04-21

llamaindex-for-pension-fundssupabaseai-agents

Why this integration matters

If you’re building AI agents for pension funds, the hard part is not the model. It’s getting reliable retrieval over policy docs, member communications, investment memos, and compliance records without turning your app into a pile of scripts.

LlamaIndex gives you the retrieval and orchestration layer. Supabase gives you Postgres, auth, and a clean place to store vectors, metadata, and agent state. Put them together and you get an AI system that can answer pension-specific questions with grounded context, persistence, and access control.

Prerequisites

•Python 3.10+
•
A Supabase project with:
- •SUPABASE_URL
- •SUPABASE_SERVICE_ROLE_KEY
•A Postgres database with pgvector enabled
•
A LlamaIndex-compatible embedding model key, for example:
- •OpenAI API key
- •or another embedding provider supported by LlamaIndex
•Install these packages:

pip install llama-index supabase psycopg2-binary python-dotenv

•
A folder of pension fund documents:
- •policy PDFs
- •trustee meeting notes
- •investment guidelines
- •member FAQ docs

Integration Steps

1) Connect to Supabase and prepare storage

Start by wiring up Supabase as your persistence layer. For a pension fund agent, I usually store document chunks in Postgres with vector support so retrieval stays inside the same system as auth and app state.

import os
from supabase import create_client, Client
from dotenv import load_dotenv

load_dotenv()

SUPABASE_URL = os.getenv("SUPABASE_URL")
SUPABASE_SERVICE_ROLE_KEY = os.getenv("SUPABASE_SERVICE_ROLE_KEY")

supabase: Client = create_client(SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY)

# Example: check connectivity
result = supabase.table("pension_docs").select("id").limit(1).execute()
print(result.data)

If the table does not exist yet, create it in Supabase SQL editor:

create extension if not exists vector;

create table if not exists pension_docs (
  id bigserial primary key,
  content text not null,
  metadata jsonb default '{}'::jsonb,
  embedding vector(1536)
);

2) Load pension documents into LlamaIndex

Use LlamaIndex to read source files and split them into chunks. For pension funds, keep chunk sizes moderate so retrieval returns precise policy sections instead of giant blobs.

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=80)

documents = SimpleDirectoryReader("./pension_data").load_data()
print(f"Loaded {len(documents)} documents")

At this point you have raw documents ready for indexing. The important part is that LlamaIndex handles parsing consistently before you push data into Supabase-backed storage.

3) Store embeddings in Supabase using a custom vector store pattern

LlamaIndex has built-in vector store integrations, but if you want full control over schema and access patterns, insert vectors directly into Supabase. This works well when your agent needs auditability for pension operations.

import json
from openai import OpenAI

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def embed_text(text: str):
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
    )
    return response.data[0].embedding

for doc in documents:
    embedding = embed_text(doc.text)

    supabase.table("pension_docs").insert({
        "content": doc.text,
        "metadata": doc.metadata or {},
        "embedding": embedding,
    }).execute()

print("Documents inserted into Supabase")

If you want tighter LlamaIndex integration, use SupabaseVectorStore from LlamaIndex’s Postgres/Supabase support where available in your version. The pattern stays the same: LlamaIndex creates nodes; Supabase stores vectors; retrieval happens against the database.

4) Build a retriever-backed query engine

Now connect LlamaIndex to the stored data. The agent will retrieve relevant pension content from Supabase before generating an answer.

from llama_index.core import StorageContext, VectorStoreIndex
from llama_index.vector_stores.supabase import SupabaseVectorStore

vector_store = SupabaseVectorStore(
    postgres_connection_string=os.getenv("SUPABASE_POSTGRES_CONNECTION_STRING"),
    collection_name="pension_docs",
)

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

query_engine = index.as_query_engine(similarity_top_k=3)

response = query_engine.query(
    "What is the policy on early retirement benefits for deferred members?"
)

print(response)

For pension fund agents, this is the core loop:

•user asks a question
•LlamaIndex retrieves policy chunks from Supabase
•model generates an answer grounded in those chunks

5) Add agent memory or workflow state in Supabase

For production agents, keep conversation state outside the model. Store thread history in Supabase so trustees or operations staff can resume sessions without losing context.

def save_chat_turn(session_id: str, role: str, message: str):
    supabase.table("agent_chat_history").insert({
        "session_id": session_id,
        "role": role,
        "message": message,
    }).execute()

def load_chat_history(session_id: str):
    result = supabase.table("agent_chat_history") \
        .select("*") \
        .eq("session_id", session_id) \
        .order("created_at") \
        .execute()
    return result.data

save_chat_turn("session_001", "user", "Summarize the drawdown rules.")
history = load_chat_history("session_001")
print(history)

That gives you traceability. In regulated environments like pensions, that matters more than fancy prompts.

Testing the Integration

Run a simple end-to-end check:

•insert one pension document into Supabase
•query it through LlamaIndex
•confirm the response cites relevant content

test_query = "Explain eligibility for spouse survivor benefits."

response = query_engine.query(test_query)
print("QUERY:", test_query)
print("ANSWER:", response)

Expected output:

QUERY: Explain eligibility for spouse survivor benefits.
ANSWER: Survivor benefits are payable to an eligible spouse if the member ...

If you get an empty or irrelevant answer:

•verify embeddings were inserted correctly
•confirm your similarity_top_k is high enough
•check that your chunking isn’t too aggressive
•inspect whether your Supabase vector column dimension matches the embedding model

Real-World Use Cases

•
Member support assistant
- •Answer questions about retirement age, contribution rules, vesting schedules, and benefit options using approved fund documentation.
•
Trustee knowledge assistant
- •Retrieve meeting minutes, investment policy statements, and actuarial summaries so trustees can ask natural-language questions across years of records.
•
Compliance review agent
- •Search policy changes against stored historical docs and chat logs to flag inconsistent guidance or missing approvals.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit