LlamaIndex Tutorial (Python): connecting to PostgreSQL for beginners
This tutorial shows you how to connect LlamaIndex to PostgreSQL from Python, then use that database as a queryable data source for your agent or RAG app. You’d use this when your structured business data already lives in Postgres and you want LlamaIndex to answer questions over it without exporting CSVs or building a separate search layer.
What You'll Need
- •Python 3.10+
- •A running PostgreSQL instance
- •A PostgreSQL database, user, and password
- •
pipfor installing packages - •An OpenAI API key if you want natural-language querying with an LLM
- •These Python packages:
- •
llama-index - •
llama-index-llms-openai - •
sqlalchemy - •
psycopg2-binary - •
python-dotenv
- •
Step-by-Step
- •Install the dependencies and set your environment variables.
Keep secrets out of code and load them from.envso the same script works locally and in CI.
pip install llama-index llama-index-llms-openai sqlalchemy psycopg2-binary python-dotenv
OPENAI_API_KEY=your_openai_api_key
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=company_db
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
- •Create a small PostgreSQL table with sample data.
LlamaIndex can query any SQL table, but it helps to start with something concrete like customers and balances.
from sqlalchemy import create_engine, text
engine = create_engine(
"postgresql+psycopg2://postgres:postgres@localhost:5432/company_db"
)
with engine.begin() as conn:
conn.execute(text("DROP TABLE IF EXISTS customers"))
conn.execute(text("""
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
full_name TEXT NOT NULL,
email TEXT NOT NULL,
balance NUMERIC(12,2) NOT NULL
)
"""))
conn.execute(text("""
INSERT INTO customers (full_name, email, balance) VALUES
('Ada Lovelace', 'ada@example.com', 1200.50),
('Grace Hopper', 'grace@example.com', 980.25),
('Alan Turing', 'alan@example.com', 150.00)
"""))
- •Build the LlamaIndex SQL database wrapper around PostgreSQL.
This is the bridge between LlamaIndex and your Postgres schema.
import os
from dotenv import load_dotenv
from sqlalchemy import create_engine
from llama_index.core import SQLDatabase
load_dotenv()
db_url = (
f"postgresql+psycopg2://{os.environ['POSTGRES_USER']}:"
f"{os.environ['POSTGRES_PASSWORD']}@"
f"{os.environ['POSTGRES_HOST']}:"
f"{os.environ['POSTGRES_PORT']}/"
f"{os.environ['POSTGRES_DB']}"
)
engine = create_engine(db_url)
sql_database = SQLDatabase(engine, include_tables=["customers"])
- •Create a query engine that can translate English questions into SQL.
For beginners, this is the most useful entry point because you can ask questions without writing SQL manually every time.
import os
from dotenv import load_dotenv
from llama_index.core import Settings, SQLDatabase, NLSQLTableQueryEngine
from llama_index.llms.openai import OpenAI
load_dotenv()
Settings.llm = OpenAI(model="gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"])
query_engine = NLSQLTableQueryEngine(
sql_database=sql_database,
tables=["customers"],
)
- •Ask a question and inspect the response.
The engine will generate SQL under the hood, run it against PostgreSQL, and return a natural-language answer.
response = query_engine.query("Who has the highest balance?")
print(str(response))
- •If you want to verify the generated SQL during debugging, inspect the source nodes.
This is useful when tuning prompts or checking whether the model queried the right table and columns.
response = query_engine.query("List all customers with balances over 500.")
print("Answer:")
print(response)
print("\nSource text:")
for node in response.source_nodes:
print(node.node.get_content())
Testing It
Run the script end to end and confirm three things: PostgreSQL starts cleanly, the table gets created with rows inserted, and LlamaIndex returns an answer instead of an exception. If you get an authentication error, check your POSTGRES_* values first; if you get an API error, confirm OPENAI_API_KEY is loaded correctly.
A good test query is something simple like “Who has the highest balance?” because it forces an aggregate plus ordering. You should see a result that points to Ada Lovelace with 1200.50.
If you want extra confidence, connect directly with psql or any SQL client and run:
SELECT full_name, balance FROM customers ORDER BY balance DESC LIMIT 1;
The result should match what LlamaIndex returns.
Next Steps
- •Add more tables and use
include_tablesto control which ones LlamaIndex can access. - •Learn how to add row-level filters so agents only see approved customer data.
- •Move from raw SQL querying to tool-based agents when you need multi-step workflows across Postgres and other systems.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit