Haystack Tutorial (TypeScript): handling long documents for beginners

By Cyprian AaronsUpdated 2026-04-21

haystackhandling-long-documents-for-beginnerstypescript

This tutorial shows you how to take a long document, split it into manageable chunks, index those chunks in Haystack, and retrieve the right parts with TypeScript. You need this when your source material is too large for a single prompt or embedding call, and you still want precise answers without dumping the entire document into your LLM.

What You'll Need

•Node.js 18+
•A TypeScript project with ts-node or a build step
•
Haystack JS package:
- •npm install @haystack-ai/core
•
An OpenAI API key for embeddings and generation:
- •export OPENAI_API_KEY="your-key"
•A working internet connection for model calls
•
Basic familiarity with:
- •Pipelines
- •Documents
- •Retrievers

Step-by-Step

•Create a small TypeScript project and wire up the environment. For long-document handling, the important part is having embeddings available so each chunk can be searched independently.

mkdir haystack-long-docs
cd haystack-long-docs
npm init -y
npm install @haystack-ai/core
npm install -D typescript ts-node @types/node
npx tsc --init

•Create a document loader and chunker. The trick is to keep chunks small enough for retrieval but large enough to preserve context; 200-400 words per chunk is a good starting point.

import { Document } from "@haystack-ai/core";

const longText = `
Haystack is useful for retrieval augmented generation when documents are too large to fit in one prompt.
In production, long policies, manuals, and claims documents need chunking before indexing.
If you skip chunking, retrieval quality drops because embeddings blur unrelated sections together.
This example demonstrates splitting text into chunks that can be searched independently.
`;

function splitIntoChunks(text: string, maxWords = 40): string[] {
  const words = text.trim().split(/\s+/);
  const chunks: string[] = [];

  for (let i = 0; i < words.length; i += maxWords) {
    chunks.push(words.slice(i, i + maxWords).join(" "));
  }

  return chunks;
}

const chunks = splitIntoChunks(longText);
const documents = chunks.map(
  (content, idx) =>
    new Document({
      content,
      meta: { source: "sample-policy.txt", chunk: idx + 1 },
    })
);

•Build an indexing pipeline with an embedder and document store. This keeps each chunk searchable on its own instead of forcing the model to reason over the full document at query time.

import {
  InMemoryDocumentStore,
  SentenceTransformersDocumentEmbedder,
} from "@haystack-ai/core";

const documentStore = new InMemoryDocumentStore();

const embedder = new SentenceTransformersDocumentEmbedder({
  model: "sentence-transformers/all-MiniLM-L6-v2",
});

await embedder.warmUp();
await embedder.run({ documents });

await documentStore.writeDocuments(documents);

console.log(`Indexed ${documents.length} chunks`);

•Retrieve the most relevant chunks for a query. For beginners, this is the core pattern: ask a question, retrieve only the matching parts of the long document, then pass those parts forward.

import { SentenceTransformersTextEmbedder } from "@haystack-ai/core";

const query = "Why do we need chunking for long documents?";
const queryEmbedder = new SentenceTransformersTextEmbedder({
  model: "sentence-transformers/all-MiniLM-L6-v2",
});

await queryEmbedder.warmUp();
const embeddedQuery = await queryEmbedder.run({ text: query });

const queryEmbedding = embeddedQuery.embedding;
const results = await documentStore.queryByEmbedding(queryEmbedding!, {
  topK: 3,
});

for (const doc of results) {
  console.log(`Chunk ${doc.meta?.chunk}: ${doc.content}`);
}

•Put retrieval behind a simple answer step. In real systems, you would send the retrieved chunks to an LLM, but even without that you can verify that the right context is being selected.

function buildContext(docs: Document[]): string {
  return docs.map((doc) => doc.content).join("\n\n---\n\n");
}

const context = buildContext(results);

console.log("Question:", query);
console.log("Retrieved context:");
console.log(context);

Testing It

Run the script and confirm it prints multiple indexed chunks and then returns only the most relevant ones for your query. If your query mentions “chunking,” you should see the chunk that explains why long documents must be split before indexing or prompting.

Try a second query like "What happens if I skip chunking?" and check whether retrieval still surfaces the right section. If it does not, reduce chunk size or increase topK to give retrieval more room. In production, test with real policy PDFs or claims manuals because synthetic text hides edge cases like repeated terms and section headers.

Next Steps

•Add metadata filters so you can search by policy type, region, or effective date
•Replace the in-memory store with a persistent vector database for production use
•Add an LLM generation step that answers strictly from retrieved chunks

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit