LangChain Tutorial (Python): connecting to PostgreSQL for advanced developers

By Cyprian AaronsUpdated 2026-04-21
langchainconnecting-to-postgresql-for-advanced-developerspython

This tutorial shows you how to connect a LangChain Python app to PostgreSQL so you can store, query, and reuse structured data from a real database. You’d use this when your agent needs persistent state, SQL-backed retrieval, or direct access to business data instead of keeping everything in memory.

What You'll Need

  • Python 3.10+
  • A running PostgreSQL instance
  • A database URL in this format: postgresql+psycopg://user:password@host:5432/dbname
  • OPENAI_API_KEY set in your environment
  • These packages:
    • langchain
    • langchain-openai
    • langchain-community
    • sqlalchemy
    • psycopg[binary]

Install them with:

pip install langchain langchain-openai langchain-community sqlalchemy psycopg[binary]

Step-by-Step

  1. Start by creating a SQLAlchemy engine for PostgreSQL and verifying the connection. LangChain’s SQL tools sit on top of SQLAlchemy, so this is the foundation for everything else.
import os
from sqlalchemy import create_engine, text

DATABASE_URL = os.environ["DATABASE_URL"]
engine = create_engine(DATABASE_URL)

with engine.connect() as conn:
    result = conn.execute(text("SELECT version();"))
    print(result.scalar_one())
  1. Create a small table and seed it with data you can query later. For agent workflows, you usually want a narrow schema with predictable columns rather than dumping unstructured blobs into Postgres.
from sqlalchemy import text

setup_sql = """
CREATE TABLE IF NOT EXISTS customer_notes (
    id SERIAL PRIMARY KEY,
    customer_name TEXT NOT NULL,
    note TEXT NOT NULL
);

TRUNCATE customer_notes;

INSERT INTO customer_notes (customer_name, note) VALUES
('Amina', 'Asked about policy renewal timing'),
('David', 'Reported a billing mismatch'),
('Priya', 'Needs documents for claim verification');
"""

with engine.begin() as conn:
    conn.execute(text(setup_sql))
  1. Build a LangChain SQL database wrapper and inspect the available tables. This gives the model enough schema context to generate valid SQL instead of guessing column names.
from langchain_community.utilities import SQLDatabase

db = SQLDatabase(engine)
print(db.get_usable_table_names())
print(db.get_table_info(["customer_notes"]))
  1. Connect an LLM and use LangChain’s SQL chain to answer questions from PostgreSQL. This is the standard pattern when you want natural language over relational data without hand-writing every query.
import os
from langchain_openai import ChatOpenAI
from langchain.chains import create_sql_query_chain

llm = ChatOpenAI(
    model="gpt-4o-mini",
    api_key=os.environ["OPENAI_API_KEY"],
)

query_chain = create_sql_query_chain(llm, db)

question = "Which customers have notes about billing?"
sql_query = query_chain.invoke({"question": question})
print(sql_query)
  1. Execute the generated SQL safely and feed the result back into the model if you want a final natural-language answer. In production, keep an eye on permissions and restrict the database user to read-only access unless writes are required.
from sqlalchemy import text

with engine.connect() as conn:
    rows = conn.execute(text(sql_query)).fetchall()

print(rows)

answer_prompt = f"""
Question: {question}
SQL: {sql_query}
Rows: {rows}
Give a concise answer.
"""

response = llm.invoke(answer_prompt)
print(response.content)
  1. If you need tighter control, skip free-form SQL generation and use Postgres as a backend store for agent memory or workflow state. That pattern is better when your app needs deterministic writes, auditability, or multi-step transactions.
from sqlalchemy import MetaData, Table, Column, Integer, String

metadata = MetaData()
workflow_state = Table(
    "workflow_state",
    metadata,
    Column("id", Integer, primary_key=True),
    Column("session_id", String(64), nullable=False),
    Column("state", String(255), nullable=False),
)

metadata.create_all(engine)

with engine.begin() as conn:
    conn.execute(
        workflow_state.insert(),
        [{"session_id": "abc123", "state": "awaiting_claim_docs"}],
    )

Testing It

Run the script end to end and confirm that SELECT version(); returns your PostgreSQL version first. Then verify that db.get_table_info(["customer_notes"]) shows the schema you created and that the generated SQL references customer_notes correctly.

If the LLM produces invalid SQL, check two things first: whether the table name is visible to SQLDatabase, and whether your prompt/question is specific enough. For real systems, test with a read-only DB user and log every generated query before execution.

Next Steps

  • Add query validation before execution using an allowlist of tables and verbs
  • Wrap this in a LangGraph workflow for multi-step agent control
  • Move from raw SQL generation to tool-based retrieval with explicit business rules

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides