Haystack Tutorial (TypeScript): connecting to PostgreSQL for advanced developers

By Cyprian AaronsUpdated 2026-04-21
haystackconnecting-to-postgresql-for-advanced-developerstypescript

This tutorial shows you how to wire Haystack TypeScript to PostgreSQL so you can store, retrieve, and query document data with a real database backend. You need this when your agent or search pipeline outgrows in-memory storage and you want durable persistence, SQL visibility, and production-friendly operations.

What You'll Need

  • Node.js 18+ and npm 9+
  • A PostgreSQL 14+ instance
  • A database user with permission to create tables
  • @haystack-ai/core installed in your TypeScript project
  • pg installed for PostgreSQL connectivity
  • TypeScript configured with strict: true
  • A PostgreSQL connection string, for example:
    • postgresql://postgres:password@localhost:5432/haystack_demo

Step-by-Step

  1. Install the packages and set up your project.
    Haystack core gives you the document and pipeline primitives; pg gives you the database client. Keep the database URL in an environment variable so you can move between local, staging, and prod without code changes.
npm init -y
npm install @haystack-ai/core pg
npm install -D typescript @types/node
  1. Create a small PostgreSQL-backed document store wrapper.
    Haystack’s TypeScript package gives you the building blocks, but for PostgreSQL you typically implement the persistence layer yourself using standard SQL. This keeps the integration explicit and makes it easier to tune indexes later.
import { Client } from "pg";
import { Document } from "@haystack-ai/core";

export class PostgresDocumentStore {
  constructor(private client: Client) {}

  async init() {
    await this.client.connect();
    await this.client.query(`
      CREATE TABLE IF NOT EXISTS haystack_documents (
        id TEXT PRIMARY KEY,
        content TEXT NOT NULL,
        meta JSONB NOT NULL DEFAULT '{}'::jsonb
      )
    `);
  }

  async writeDocuments(documents: Document[]) {
    for (const doc of documents) {
      await this.client.query(
        `INSERT INTO haystack_documents (id, content, meta)
         VALUES ($1, $2, $3)
         ON CONFLICT (id) DO UPDATE SET content = EXCLUDED.content, meta = EXCLUDED.meta`,
        [doc.id ?? crypto.randomUUID(), doc.content, doc.meta ?? {}]
      );
    }
  }
}
  1. Add retrieval logic that reads back rows as Haystack documents.
    The key detail is preserving content and meta in a shape Haystack can consume later in your pipeline. If you plan to use embeddings or filters, keep metadata structured as JSONB instead of flattening it into text.
import { Client } from "pg";
import { Document } from "@haystack-ai/core";

export class PostgresDocumentStore {
  constructor(private client: Client) {}

  async getAllDocuments(): Promise<Document[]> {
    const result = await this.client.query(
      `SELECT id, content, meta FROM haystack_documents ORDER BY id ASC`
    );

    return result.rows.map(
      (row) =>
        new Document({
          id: row.id,
          content: row.content,
          meta: row.meta,
        })
    );
  }
}
  1. Write a runnable entrypoint that stores and fetches documents.
    This is the fastest way to prove the connection works before wiring it into a retriever or agent flow. Use a single file first; once it works, split the store into its own module.
import { Client } from "pg";
import { Document } from "@haystack-ai/core";

class PostgresDocumentStore {
  constructor(private client: Client) {}

  async init() {
    await this.client.connect();
    await this.client.query(`
      CREATE TABLE IF NOT EXISTS haystack_documents (
        id TEXT PRIMARY KEY,
        content TEXT NOT NULL,
        meta JSONB NOT NULL DEFAULT '{}'::jsonb
      )
    `);
  }

  async writeDocuments(documents: Document[]) {
    for (const doc of documents) {
      await this.client.query(
        `INSERT INTO haystack_documents (id, content, meta)
         VALUES ($1, $2, $3)
         ON CONFLICT (id) DO UPDATE SET content = EXCLUDED.content, meta = EXCLUDED.meta`,
        [doc.id ?? crypto.randomUUID(), doc.content, doc.meta ?? {}]
      );
    }
  }

  async getAllDocuments(): Promise<Document[]> {
    const result = await this.client.query(
      `SELECT id, content, meta FROM haystack_documents ORDER BY id ASC`
    );

    return result.rows.map(
      (row) =>
        new Document({ id: row.id, content: row.content, meta: row.meta })
    );
  }
}

async function main() {
  const client = new Client({ connectionString: process.env.POSTGRES_URL });
  const store = new PostgresDocumentStore(client);

  await store.init();

  await store.writeDocuments([
    new Document({
      content: "Claims policy requires manual review above $10k.",
      meta: { source: "policy", team: "claims" },
    }),
    new Document({
      content: "KYC exceptions must be approved by compliance.",
      meta: { source: "policy", team: "compliance" },
    }),
  ]);

  const docs = await store.getAllDocuments();
  console.log(docs.map((d) => ({ id: d.id, content: d.content, meta: d.meta })));

  await client.end();
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});
  1. Add an index for the queries you actually run.
    If you only ever fetch by ID or scan a few records in development, the default table is enough. In production, add indexes on JSONB metadata fields you filter on often.
import { Client } from "pg";

async function addIndexes(client: Client) {
  await client.query(`
    CREATE INDEX IF NOT EXISTS haystack_documents_meta_gin
    ON haystack_documents USING GIN (meta)
  `);

  await client.query(`
    CREATE INDEX IF NOT EXISTS haystack_documents_team_idx
    ON haystack_documents ((meta->>'team'))
  `);
}

Testing It

Run your PostgreSQL container or point POSTGRES_URL at a real instance first. Then execute the TypeScript file with tsx, ts-node, or after compiling with tsc, and confirm the table gets created without errors.

Check that inserted documents come back with the same IDs, content strings, and metadata objects. Also verify directly in psql that rows exist in haystack_documents, because SQL-level validation catches serialization bugs faster than logging alone.

If you plan to use this in a pipeline next, test duplicate inserts too. The upsert should replace existing content instead of creating duplicate rows.

Next Steps

  • Add vector embeddings and a similarity search table alongside the document store.
  • Wrap this store behind a Haystack retriever interface so it plugs into your pipeline cleanly.
  • Add transaction handling and connection pooling for multi-request workloads in production.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides