LlamaIndex Tutorial (TypeScript): connecting to PostgreSQL for advanced developers
This tutorial shows you how to wire LlamaIndex TypeScript to PostgreSQL so you can store, query, and retrieve data from a real database instead of keeping everything in memory. You’d use this when your agent needs durable storage for documents, metadata, or application state that must survive restarts and support production querying.
What You'll Need
- •Node.js 18+ and npm
- •A PostgreSQL instance running locally or in Docker
- •A database name, user, password, host, and port
- •An OpenAI API key
- •These packages:
- •
llamaindex - •
pg - •
dotenv - •
typescript - •
tsxorts-node
- •
Install them:
npm install llamaindex pg dotenv
npm install -D typescript tsx @types/node
Create a .env file:
OPENAI_API_KEY=your_openai_api_key
PGHOST=localhost
PGPORT=5432
PGDATABASE=llamaindex_demo
PGUSER=postgres
PGPASSWORD=postgres
Step-by-Step
- •Set up your PostgreSQL table first. For advanced use cases, keep the schema explicit so you control indexing, retention, and how LlamaIndex maps rows into nodes.
CREATE TABLE IF NOT EXISTS documents (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
category TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
INSERT INTO documents (title, content, category) VALUES
('Payment Dispute Handling', 'Chargebacks must be reviewed within five business days.', 'ops'),
('KYC Review', 'High-risk customers require enhanced due diligence.', 'compliance'),
('Claims Triage', 'Automate first-pass claims routing using document metadata.', 'insurance');
- •Load data from PostgreSQL into LlamaIndex using the official TypeScript APIs. Here we connect with
pg, fetch rows, and turn them intoDocumentobjects with metadata that downstream retrieval can filter on.
import "dotenv/config";
import pg from "pg";
import { Document } from "llamaindex";
const { Pool } = pg;
async function main() {
const pool = new Pool({
host: process.env.PGHOST,
port: Number(process.env.PGPORT),
database: process.env.PGDATABASE,
user: process.env.PGUSER,
password: process.env.PGPASSWORD,
});
const result = await pool.query(
"SELECT id, title, content, category FROM documents ORDER BY id ASC"
);
const docs = result.rows.map(
(row) =>
new Document({
text: `${row.title}\n\n${row.content}`,
metadata: {
id: row.id,
title: row.title,
category: row.category,
},
})
);
console.log(`Loaded ${docs.length} documents from PostgreSQL`);
await pool.end();
}
main().catch(console.error);
- •Build a vector index from those PostgreSQL-backed documents. This keeps the source of truth in Postgres while letting LlamaIndex handle semantic retrieval on top of it.
import "dotenv/config";
import pg from "pg";
import {
Document,
VectorStoreIndex,
Settings,
OpenAI,
} from "llamaindex";
const { Pool } = pg;
Settings.llm = new OpenAI({ model: "gpt-4o-mini" });
async function main() {
const pool = new Pool({
host: process.env.PGHOST,
port: Number(process.env.PGPORT),
database: process.env.PGDATABASE,
user: process.env.PGUSER,
password: process.env.PGPASSWORD,
});
const result = await pool.query("SELECT id, title, content, category FROM documents");
const docs = result.rows.map(
(row) =>
new Document({
text: `${row.title}\n\n${row.content}`,
metadata: { id: row.id, category: row.category },
})
);
const index = await VectorStoreIndex.fromDocuments(docs);
console.log("Vector index built:", !!index);
await pool.end();
}
main().catch(console.error);
- •Query the index with a retriever and inspect which PostgreSQL rows were used. This is the part that matters in production because it gives you traceability back to the original records.
import "dotenv/config";
import pg from "pg";
import {
Document,
VectorStoreIndex,
Settings,
OpenAI,
} from "llamaindex";
const { Pool } = pg;
Settings.llm = new OpenAI({ model: "gpt-4o-mini" });
async function main() {
const pool = new Pool({
host: process.env.PGHOST,
port: Number(process.env.PGPORT),
database: process.env.PGDATABASE,
user: process.env.PGUSER,
password: process.env.PGPASSWORD,
});
const result = await pool.query("SELECT id, title, content, category FROM documents");
const docs = result.rows.map(
(row) =>
new Document({
text: `${row.title}\n\n${row.content}`,
metadata: { id: row.id, category: row.category },
})
);
const index = await VectorStoreIndex.fromDocuments(docs);
function getRetriever() {
return index.asRetriever({ similarityTopK: 2 });
}
const retriever = getRetriever();
const nodes = await retriever.retrieve({ queryText: "How do I route claims?" });
console.log(
nodes.map((node) => ({
score: node.score?.toFixed(4),
textSnippet: node.node.getContent().slice(0, 80),
metadata: node.node.metadata,
}))
);
await pool.end();
}
main().catch(console.error);
- •Wrap retrieval in a simple question-answer flow. This is where LlamaIndex becomes useful for agents because you can combine semantic search with an LLM response grounded in your database rows.
import "dotenv/config";
import pg from "pg";
import {
Document,
VectorStoreIndex,
Settings,
OpenAI,
} from "llamaindex";
const { Pool } = pg;
Settings.llm = new OpenAI({ model: "gpt-4o-mini" });
async function main() {
const pool = new Pool({
host: process.env.PGHOST,
port: Number(process.env.PGPORT),
database: process.env.PGDATABASE,
user: process.env.PGUSER,
password: process.env.PGPASSWORD,
});
const result = await pool.query("SELECT id, title, content, category FROM documents");
const docs = result.rows.map(
(row) =>
new Document({
text:`${row.title}\n\n${row.content}`,
metadata:{ id:data?.id ?? row.id, category:data?.category ?? row.category },
})
);
const index = await VectorStoreIndex.fromDocuments(docs);
const queryEngine = index.asQueryEngine();
const response = await queryEngine.query({
queryText:"What should I know about compliance reviews?",
});
console.log(response.toString());
await pool.end();
}
main().catch(console.error);
Testing It
Run your script with npx tsx your-file.ts. If PostgreSQL is reachable and your .env values are correct, you should see rows loaded and either retrieved snippets or an answer generated by the query engine.
If retrieval looks wrong, check two things first:
- •Your document text is actually meaningful and not just IDs.
- •Your metadata is preserved so you can inspect what came back.
For production debugging, log the returned node IDs and source metadata before sending anything to the LLM. That gives you a clean audit trail when an answer looks off.
Next Steps
- •Add a real Postgres-backed vector store instead of rebuilding the index at runtime.
- •Introduce filters on metadata like
category, tenant ID, or document status. - •Add migrations and seed scripts so your agent environment is reproducible across dev and staging.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit