LangChain Tutorial (Python): connecting to PostgreSQL for advanced developers
This tutorial shows you how to connect a LangChain Python app to PostgreSQL so you can store, query, and reuse structured data from a real database. You’d use this when your agent needs persistent state, SQL-backed retrieval, or direct access to business data instead of keeping everything in memory.
What You'll Need
- •Python 3.10+
- •A running PostgreSQL instance
- •A database URL in this format:
postgresql+psycopg://user:password@host:5432/dbname - •
OPENAI_API_KEYset in your environment - •These packages:
- •
langchain - •
langchain-openai - •
langchain-community - •
sqlalchemy - •
psycopg[binary]
- •
Install them with:
pip install langchain langchain-openai langchain-community sqlalchemy psycopg[binary]
Step-by-Step
- •Start by creating a SQLAlchemy engine for PostgreSQL and verifying the connection. LangChain’s SQL tools sit on top of SQLAlchemy, so this is the foundation for everything else.
import os
from sqlalchemy import create_engine, text
DATABASE_URL = os.environ["DATABASE_URL"]
engine = create_engine(DATABASE_URL)
with engine.connect() as conn:
result = conn.execute(text("SELECT version();"))
print(result.scalar_one())
- •Create a small table and seed it with data you can query later. For agent workflows, you usually want a narrow schema with predictable columns rather than dumping unstructured blobs into Postgres.
from sqlalchemy import text
setup_sql = """
CREATE TABLE IF NOT EXISTS customer_notes (
id SERIAL PRIMARY KEY,
customer_name TEXT NOT NULL,
note TEXT NOT NULL
);
TRUNCATE customer_notes;
INSERT INTO customer_notes (customer_name, note) VALUES
('Amina', 'Asked about policy renewal timing'),
('David', 'Reported a billing mismatch'),
('Priya', 'Needs documents for claim verification');
"""
with engine.begin() as conn:
conn.execute(text(setup_sql))
- •Build a LangChain SQL database wrapper and inspect the available tables. This gives the model enough schema context to generate valid SQL instead of guessing column names.
from langchain_community.utilities import SQLDatabase
db = SQLDatabase(engine)
print(db.get_usable_table_names())
print(db.get_table_info(["customer_notes"]))
- •Connect an LLM and use LangChain’s SQL chain to answer questions from PostgreSQL. This is the standard pattern when you want natural language over relational data without hand-writing every query.
import os
from langchain_openai import ChatOpenAI
from langchain.chains import create_sql_query_chain
llm = ChatOpenAI(
model="gpt-4o-mini",
api_key=os.environ["OPENAI_API_KEY"],
)
query_chain = create_sql_query_chain(llm, db)
question = "Which customers have notes about billing?"
sql_query = query_chain.invoke({"question": question})
print(sql_query)
- •Execute the generated SQL safely and feed the result back into the model if you want a final natural-language answer. In production, keep an eye on permissions and restrict the database user to read-only access unless writes are required.
from sqlalchemy import text
with engine.connect() as conn:
rows = conn.execute(text(sql_query)).fetchall()
print(rows)
answer_prompt = f"""
Question: {question}
SQL: {sql_query}
Rows: {rows}
Give a concise answer.
"""
response = llm.invoke(answer_prompt)
print(response.content)
- •If you need tighter control, skip free-form SQL generation and use Postgres as a backend store for agent memory or workflow state. That pattern is better when your app needs deterministic writes, auditability, or multi-step transactions.
from sqlalchemy import MetaData, Table, Column, Integer, String
metadata = MetaData()
workflow_state = Table(
"workflow_state",
metadata,
Column("id", Integer, primary_key=True),
Column("session_id", String(64), nullable=False),
Column("state", String(255), nullable=False),
)
metadata.create_all(engine)
with engine.begin() as conn:
conn.execute(
workflow_state.insert(),
[{"session_id": "abc123", "state": "awaiting_claim_docs"}],
)
Testing It
Run the script end to end and confirm that SELECT version(); returns your PostgreSQL version first. Then verify that db.get_table_info(["customer_notes"]) shows the schema you created and that the generated SQL references customer_notes correctly.
If the LLM produces invalid SQL, check two things first: whether the table name is visible to SQLDatabase, and whether your prompt/question is specific enough. For real systems, test with a read-only DB user and log every generated query before execution.
Next Steps
- •Add query validation before execution using an allowlist of tables and verbs
- •Wrap this in a LangGraph workflow for multi-step agent control
- •Move from raw SQL generation to tool-based retrieval with explicit business rules
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit