Haystack Tutorial (TypeScript): handling long documents for advanced developers

By Cyprian AaronsUpdated 2026-04-21

haystackhandling-long-documents-for-advanced-developerstypescript

This tutorial shows how to ingest, split, index, and query long documents in Haystack using TypeScript without blowing up your prompt budget or losing retrieval quality. You need this when your source material is too large for a single LLM context window and you want a production-friendly pipeline that still returns precise answers.

What You'll Need

•Node.js 18+
•A TypeScript project with ts-node or a build step
•
Haystack JS packages:
- •@haystack/core
- •@haystack/integrations
•An OpenAI API key set as OPENAI_API_KEY
•
A local or remote document source:
- •PDF
- •Markdown
- •plain text
•Basic familiarity with Haystack components like Document, Pipeline, and retrievers

Step-by-Step

•Start by installing the packages and setting up a clean TypeScript entrypoint. For long-document workflows, keep the pipeline modular so you can swap chunking, embedding, and retrieval strategies later.

npm init -y
npm install @haystack/core @haystack/integrations
npm install -D typescript ts-node @types/node

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "NodeNext",
    "moduleResolution": "NodeNext",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true
  }
}

•Load the document and split it into manageable chunks before indexing. The main mistake people make with long documents is trying to embed whole files; chunking with overlap preserves context across boundaries.

import { Document } from "@haystack/core";

const longText = `
# Claims Policy

Section 1: Eligibility...
Section 2: Exclusions...
Section 3: Claims handling...
`.repeat(200);

const doc = new Document({
  content: longText,
  meta: {
    source: "policy.md",
    type: "policy"
  }
});

function splitText(text: string, chunkSize = 1200, overlap = 200): string[] {
  const chunks: string[] = [];
  let start = 0;

  while (start < text.length) {
    const end = Math.min(start + chunkSize, text.length);
    chunks.push(text.slice(start, end));
    start = end - overlap;
    if (start < 0) start = 0;
    if (start >= text.length) break;
  }

  return chunks;
}

const chunks = splitText(doc.content);
console.log(`Created ${chunks.length} chunks`);

•Convert each chunk into a Haystack Document and attach metadata that helps retrieval later. In production, store the original document ID, chunk index, and source path so you can trace answers back to the exact section.

import { Document } from "@haystack/core";

const chunkDocs = chunks.map((chunk, idx) => {
  return new Document({
    content: chunk,
    meta: {
      source: doc.meta?.source,
      type: doc.meta?.type,
      parent_id: "policy-001",
      chunk_index: idx,
      total_chunks: chunks.length
    }
  });
});

console.log(chunkDocs[0].meta);

•Index the chunks with embeddings and a vector store. This example uses OpenAI embeddings plus an in-memory store pattern; for real systems, replace the store with something durable like pgvector or Pinecone.

import { OpenAITextEmbedder } from "@haystack/integrations";
import { Document } from "@haystack/core";

const embedder = new OpenAITextEmbedder({
  model: "text-embedding-3-small",
});

async function embedChunks(docs: Document[]) {
  const embedded = [];
  for (const d of docs) {
    const result = await embedder.run({ text: d.content });
    embedded.push({
      ...d,
      embedding: result.embedding
    });
  }
  return embedded;
}

const embeddedDocs = await embedChunks(chunkDocs);
console.log(`Embedded ${embeddedDocs.length} chunks`);

•Retrieve only the most relevant chunks at query time, then answer against those snippets instead of the full document. This is the core pattern for long documents: narrow first, generate second.

import { OpenAIChatGenerator } from "@haystack/integrations";

const generator = new OpenAIChatGenerator({
  model: "gpt-4o-mini"
});

function cosineSimilarity(a: number[], b: number[]): number {
  let dot = 0, magA = 0, magB = 0;

  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    magA += a[i] * a[i];
    magB += b[i] * b[i];
  }

  return dot / (Math.sqrt(magA) * Math.sqrt(magB));
}

•Run an end-to-end query flow and inspect which chunks were used in the final answer. For advanced use cases like insurance claims or policy Q&A, this traceability matters as much as accuracy.

async function answerQuestion(question: string) {
  const qEmbedding = await embedder.run({ text: question });

  const ranked = embeddedDocs
    .map((d) => ({
      doc: d,
      score: cosineSimilarity(qEmbedding.embedding, d.embedding as number[])
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, 3);

  const context = ranked.map((r) => r.doc.content).join("\n\n---\n\n");

  const response = await generator.run({
    messages: [
      {
        role: "system",
        content:
          "Answer only from the provided context. If missing, say you do not know."
      },
      {
        role: "user",
        content: `Context:\n${context}\n\nQuestion:\n${question}`
      }
    ]
  });

  console.log("Top chunks:", ranked.map(r => r.doc.meta?.chunk_index));
  console.log("Answer:", response.message.content);
}

await answerQuestion("What does the policy say about claims exclusions?");

Testing It

Run the script with a question that should clearly map to one section of the document, then verify that only a few chunks are retrieved and passed into generation. Check that the returned answer cites information present in those chunks and does not hallucinate details outside them.

For deeper validation, ask one question whose answer exists in the document and another that does not. The first should produce a grounded answer; the second should return a refusal or “I do not know” style response.

Also inspect chunk metadata in logs to confirm you can trace every answer back to parent_id and chunk_index. That trace is what makes long-document systems usable in regulated environments.

Next Steps

•Replace the naive similarity search with a real vector database such as pgvector or Pinecone.
•Add semantic chunking so boundaries follow headings and paragraphs instead of raw character counts.
•Build evaluation tests for retrieval recall on your own document set before shipping to users

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit