Haystack Tutorial (TypeScript): connecting to PostgreSQL for beginners

By Cyprian AaronsUpdated 2026-04-21
haystackconnecting-to-postgresql-for-beginnerstypescript

This tutorial shows you how to connect a Haystack TypeScript app to PostgreSQL, store documents in a real database, and retrieve them with vector search. You need this when you want your agent or RAG pipeline to keep data outside memory and survive restarts, deployments, and multi-user traffic.

What You'll Need

  • Node.js 18+ and npm
  • A PostgreSQL instance with the pgvector extension enabled
  • A PostgreSQL database name, user, password, host, and port
  • A TypeScript project already set up with Haystack installed
  • These packages:
    • haystack
    • pg
    • dotenv
    • typescript
    • ts-node or a build step with tsc
  • Optional but useful:
    • @types/pg

Step-by-Step

  1. Start by installing the packages and creating a .env file for your database connection. Keep credentials out of source control and use environment variables from day one.
npm install haystack pg dotenv
npm install -D typescript ts-node @types/node @types/pg
PGHOST=localhost
PGPORT=5432
PGDATABASE=haystack_demo
PGUSER=postgres
PGPASSWORD=postgres
  1. Create a small PostgreSQL helper so your app can open connections cleanly. This keeps the database setup isolated and makes it easy to reuse in pipelines and scripts.
import "dotenv/config";
import { Pool } from "pg";

export const pool = new Pool({
  host: process.env.PGHOST,
  port: Number(process.env.PGPORT ?? "5432"),
  database: process.env.PGDATABASE,
  user: process.env.PGUSER,
  password: process.env.PGPASSWORD,
});

export async function testConnection() {
  const result = await pool.query("SELECT NOW() AS now");
  console.log("Connected at:", result.rows[0].now);
}
  1. Prepare PostgreSQL for vector storage. Haystack needs a table that can hold document text, metadata, and embeddings; pgvector gives you the vector column and similarity search support.
import { pool } from "./db";

async function setup() {
  await pool.query(`CREATE EXTENSION IF NOT EXISTS vector`);

  await pool.query(`
    CREATE TABLE IF NOT EXISTS documents (
      id TEXT PRIMARY KEY,
      content TEXT NOT NULL,
      meta JSONB NOT NULL DEFAULT '{}'::jsonb,
      embedding VECTOR(1536) NOT NULL
    )
  `);

  console.log("Database is ready");
}

setup().finally(() => pool.end());
  1. Build a simple ingestion script that writes documents into PostgreSQL. In a real app, the embedding step comes from your model provider; here we keep the shape explicit so you can plug in your own embedding service without changing the storage layer.
import { pool } from "./db";

type Doc = {
  id: string;
  content: string;
  meta: Record<string, unknown>;
};

const docs: Doc[] = [
  {
    id: "doc-1",
    content: "PostgreSQL is a relational database with strong consistency.",
    meta: { source: "handbook", topic: "postgres" },
  },
];

const fakeEmbedding = new Array(1536).fill(0).map((_, i) => (i % 10) / 10);

async function ingest() {
  for (const doc of docs) {
    await pool.query(
      `INSERT INTO documents (id, content, meta, embedding)
       VALUES ($1, $2, $3, $4)
       ON CONFLICT (id) DO UPDATE SET
         content = EXCLUDED.content,
         meta = EXCLUDED.meta,
         embedding = EXCLUDED.embedding`,
      [doc.id, doc.content, doc.meta, fakeEmbedding]
    );
  }

  console.log("Documents stored");
}

ingest().finally(() => pool.end());
  1. Query PostgreSQL from Haystack-style application code by retrieving the nearest vectors. This is the part you wire into your retriever or tool layer so your agent can fetch relevant context before generating an answer.
import { pool } from "./db";

async function search(queryEmbedding: number[]) {
  const result = await pool.query(
    `
    SELECT id, content, meta,
           embedding <=> $1::vector AS distance
    FROM documents
    ORDER BY embedding <=> $1::vector ASC
    LIMIT 5
    `,
    [queryEmbedding]
  );

  return result.rows;
}

const queryEmbedding = new Array(1536).fill(0).map((_, i) => (i % 7) / 7);

search(queryEmbedding).then((rows) => {
  console.log(rows);
}).finally(() => pool.end());

Testing It

Run the setup script first and confirm that the vector extension exists and the documents table is created. Then run the ingestion script and verify that at least one row is present in PostgreSQL.

Next, run the search script and check that it returns rows ordered by distance instead of random records. If you see empty results or SQL errors, inspect your .env values first; most connection issues are bad hostnames, ports, or passwords.

For a real Haystack integration test, replace the fake embedding array with embeddings from your model provider and confirm that retrieved documents match the query topic. That tells you both layers are working: storage in PostgreSQL and retrieval through vector similarity.

Next Steps

  • Wire this storage layer into a Haystack pipeline with a real embedder and retriever.
  • Add metadata filters for tenant ID, document type, or compliance tags.
  • Move from manual scripts to migrations so schema changes are versioned and repeatable.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides