LangChain Tutorial (TypeScript): handling long documents for advanced developers

By Cyprian AaronsUpdated 2026-04-21

langchainhandling-long-documents-for-advanced-developerstypescript

This tutorial shows how to ingest, split, retrieve, and answer questions over long documents in TypeScript using LangChain. You need this when a single prompt can’t hold the full source material, but you still need accurate answers grounded in the document.

What You'll Need

•Node.js 18+
•A TypeScript project with ts-node or a build step
•langchain
•@langchain/openai
•@langchain/community
•An OpenAI API key in OPENAI_API_KEY
•A long text file to test with, such as a policy document, contract, or technical spec

Install the packages:

npm install langchain @langchain/openai @langchain/community
npm install -D typescript ts-node @types/node

Step-by-Step

•Start by loading a long document from disk. For real systems, keep raw source documents separate from your application code so you can reprocess them without redeploying.

import { TextLoader } from "@langchain/community/document_loaders/fs/text";
import { Document } from "@langchain/core/documents";

async function loadDocument(path: string): Promise<Document[]> {
  const loader = new TextLoader(path);
  return await loader.load();
}

const docs = await loadDocument("./docs/policy.txt");
console.log(`Loaded ${docs.length} document(s)`);
console.log(docs[0].pageContent.slice(0, 200));

•Split the document into chunks before sending it to embeddings or retrieval. The important part is overlap: it preserves context across chunk boundaries and reduces answer drift on long passages.

import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});

const splitDocs = await splitter.splitDocuments(docs);

console.log(`Created ${splitDocs.length} chunks`);
console.log(splitDocs[0].pageContent.slice(0, 200));

•Embed the chunks and store them in a vector index. For long-document QA, this is the core pattern: retrieve only the relevant chunks instead of stuffing everything into one prompt.

import { OpenAIEmbeddings } from "@langchain/openai";
import { MemoryVectorStore } from "langchain/vectorstores/memory";

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});

const vectorStore = await MemoryVectorStore.fromDocuments(splitDocs, embeddings);

const retriever = vectorStore.asRetriever(4);
const relevantDocs = await retriever.invoke("What is the cancellation policy?");
console.log(`Retrieved ${relevantDocs.length} chunks`);

•Wire retrieval into an answer chain. Use a prompt that forces the model to stay grounded in context and say when the answer is missing from the retrieved text.

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
import { createRetrievalChain } from "langchain/chains/retrieval";

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0,
});

const prompt = ChatPromptTemplate.fromTemplate(`
Answer the question using only the context below.
If the answer is not in the context, say "I don't know".

<context>
{context}
</context>

Question: {input}
`);

const combineDocsChain = await createStuffDocumentsChain({
  llm,
  prompt,
});

const retrievalChain = await createRetrievalChain({
  retriever,
  combineDocsChain,
});

•Run an end-to-end query and inspect both the answer and sources. In production, I always log retrieved chunks during evaluation so I can tell whether bad answers come from retrieval or generation.

const result = await retrievalChain.invoke({
  input: "What is the cancellation policy?",
});

console.log("Answer:");
console.log(result.answer);

console.log("\nSources:");
for (const doc of result.context) {
  console.log("---");
  console.log(doc.pageContent.slice(0, 300));
}

Testing It

Run the script against a real long document with multiple sections and cross-references. Ask questions that are explicitly answered in one section and questions that require combining nearby paragraphs; both should work if chunking and retrieval are set up correctly.

Then ask something that does not exist in the document. The model should refuse cleanly with “I don't know” instead of inventing details.

If answers look vague or wrong, check three things first: chunk size, chunk overlap, and whether your retriever is returning enough documents. Most failures in long-document systems happen there, not in the LLM call itself.

Next Steps

•Add metadata filters so you can search by document type, version, or tenant
•Replace MemoryVectorStore with a persistent store like Pinecone, pgvector, or Weaviate
•Add evaluation scripts that compare retrieved chunks against expected answers for regression testing

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit