Haystack Tutorial (TypeScript): handling long documents for intermediate developers
This tutorial shows how to take a long document, split it into retrieval-friendly chunks, index those chunks, and answer questions against them in TypeScript with Haystack. You need this when a single file is too large for one prompt, or when retrieval quality drops because the model only sees a small slice of the source.
What You'll Need
- •Node.js 18+ and npm
- •A TypeScript project with
ts-nodeor a build step - •
haystackJavaScript/TypeScript package installed - •An OpenAI API key set as
OPENAI_API_KEY - •A long text file to test with, such as a policy document, contract, or handbook
Install the package:
npm install haystack
Step-by-Step
- •Start by loading your long document from disk and turning it into plain text. For real systems, this usually comes from PDFs, DOCX exports, HTML pages, or internal wiki dumps.
import { readFileSync } from "node:fs";
const rawText = readFileSync("./docs/employee-handbook.txt", "utf8");
console.log(`Loaded ${rawText.length} characters`);
console.log(rawText.slice(0, 300));
- •Split the document into overlapping chunks. The overlap matters because important context often crosses paragraph boundaries, and without it you get brittle retrieval.
type Chunk = {
id: string;
content: string;
};
function chunkText(text: string, chunkSize = 1200, overlap = 200): Chunk[] {
const chunks: Chunk[] = [];
let start = 0;
let index = 0;
while (start < text.length) {
const end = Math.min(start + chunkSize, text.length);
chunks.push({
id: `chunk-${index}`,
content: text.slice(start, end),
});
start = end - overlap;
index += 1;
if (start < 0) start = 0;
if (end === text.length) break;
}
return chunks;
}
const chunks = chunkText(rawText);
console.log(`Created ${chunks.length} chunks`);
console.log(chunks[0]);
- •Index the chunks in an in-memory document store and attach embeddings through an embedding retriever pipeline. This gives you a searchable corpus instead of trying to stuff the whole document into one prompt.
import { InMemoryDocumentStore } from "haystack/document-stores/in-memory";
import { Document } from "haystack/document";
import { OpenAITextEmbedder } from "haystack/components/embedders/openai-text-embedder";
import { DocumentWriter } from "haystack/components/writers/document-writer";
const documents = chunks.map(
(chunk) =>
new Document({
id: chunk.id,
content: chunk.content,
meta: { source: "employee-handbook.txt" },
})
);
const documentStore = new InMemoryDocumentStore();
const embedder = new OpenAITextEmbedder({
apiKey: process.env.OPENAI_API_KEY!,
});
const writer = new DocumentWriter({ documentStore });
await embedder.warmup();
const embeddedDocuments = await embedder.run({ texts: documents.map((d) => d.content) });
documents.forEach((doc, i) => {
doc.embedding = embeddedDocuments.embeddings[i];
});
await writer.run({ documents });
console.log(`Indexed ${documents.length} documents`);
- •Build a query path that embeds the user question, retrieves the top matching chunks, and sends only those chunks to the generator. This is the core pattern for long-document QA: retrieve first, generate second.
import { OpenAIChatGenerator } from "haystack/components/generators/openai-chat-generator";
import { InMemoryEmbeddingRetriever } from "haystack/retrievers/in-memory-embedding-retriever";
const retriever = new InMemoryEmbeddingRetriever({
documentStore,
});
const generator = new OpenAIChatGenerator({
apiKey: process.env.OPENAI_API_KEY!,
});
await retriever.warmup();
await generator.warmup();
const question = "What is the vacation policy for new employees?";
const queryEmbeddingResult = await embedder.run({ texts: [question] });
const retrieved = await retriever.run({
queryEmbedding: queryEmbeddingResult.embeddings[0],
topK: 3,
});
const context = retrieved.documents
.map((doc) => doc.content)
.join("\n\n---\n\n");
const answerResult = await generator.run({
messages: [
{
role: "system",
content:
"Answer using only the provided context. If the answer is not in the context, say you do not know.",
},
{
role: "user",
content: `Context:\n${context}\n\nQuestion:\n${question}`,
},
],
});
console.log(answerResult.replies[0].content);
- •Add a simple guardrail for very long inputs by checking chunk coverage and inspecting which sections were retrieved. In production, this is where you catch bad chunking before users do.
function printRetrievedChunks(docs: Array<{ id?: string; content?: string }>) {
docs.forEach((doc, i) => {
const preview = (doc.content ?? "").slice(0, 120).replace(/\s+/g, " ");
console.log(`${i + 1}. ${doc.id ?? "unknown"} -> ${preview}...`);
});
}
printRetrievedChunks(retrieved.documents);
if (retrieved.documents.length === 0) {
throw new Error("No relevant chunks were retrieved. Check chunking or embeddings.");
}
Testing It
Run the script against a real long document with several questions that are answered in different parts of the file. You want to confirm that retrieval returns different chunks depending on the question, not just the first page every time.
Try one question whose answer appears near the beginning of the document and another near the end. If both return similar context blocks, your chunk size may be too large or your embeddings may not be discriminating enough.
Also test an out-of-scope question. The model should say it does not know rather than inventing an answer from weak context.
Next Steps
- •Replace
InMemoryDocumentStorewith Postgres or Elasticsearch for persistent indexing - •Add metadata filters for department, policy version, or effective date
- •Move from plain chunking to structure-aware splitting using headings and paragraphs
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit