LangChain Tutorial (TypeScript): handling long documents for beginners
This tutorial shows you how to take a long document, split it into manageable chunks, summarize each chunk, and then combine those summaries into one answer using LangChain in TypeScript. You need this when a document is too large to fit in a model’s context window or when you want more reliable answers from long PDFs, reports, or policy docs.
What You'll Need
- •Node.js 18+
- •A TypeScript project
- •
langchain - •
@langchain/openai - •An OpenAI API key in
OPENAI_API_KEY - •A long text file to test with, such as
./data/long-document.txt
Install the packages:
npm install langchain @langchain/openai
npm install -D typescript tsx @types/node
Set your environment variable:
export OPENAI_API_KEY="your-api-key"
Step-by-Step
- •Create a small TypeScript script that loads your long document from disk. Keeping the source as plain text makes the example easy to run and easy to debug before moving on to PDFs or HTML.
import { readFile } from "node:fs/promises";
async function main() {
const text = await readFile("./data/long-document.txt", "utf-8");
console.log("Loaded characters:", text.length);
}
main().catch(console.error);
- •Split the document into chunks with overlap. The overlap matters because important context often sits across chunk boundaries, and without it your summaries get brittle.
import { readFile } from "node:fs/promises";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
async function main() {
const text = await readFile("./data/long-document.txt", "utf-8");
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1500,
chunkOverlap: 200,
});
const chunks = await splitter.splitText(text);
console.log("Chunks:", chunks.length);
console.log("First chunk preview:", chunks[0].slice(0, 300));
}
main().catch(console.error);
- •Summarize each chunk using a chat model. This is the core pattern for long documents: process locally sized pieces first, then combine the results instead of sending the whole file at once.
import { readFile } from "node:fs/promises";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";
async function main() {
const text = await readFile("./data/long-document.txt", "utf-8");
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1500,
chunkOverlap: 200,
});
const chunks = await splitter.splitText(text);
const model = new ChatOpenAI({ model: "gpt-4o-mini", temperature: 0 });
const prompt = PromptTemplate.fromTemplate(
"Summarize this document chunk in 5 bullet points:\n\n{chunk}"
);
const summaries = [];
for (const chunk of chunks.slice(0, 3)) {
const formatted = await prompt.format({ chunk });
const result = await model.invoke(formatted);
summaries.push(result.content.toString());
}
console.log(summaries.join("\n\n---\n\n"));
}
main().catch(console.error);
- •Combine all chunk summaries into one final answer. This gives you a clean “map-reduce” style pipeline: map over chunks, reduce them into one result.
import { readFile } from "node:fs/promises";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";
async function main() {
const text = await readFile("./data/long-document.txt", "utf-8");
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1500,
chunkOverlap: 200,
});
const chunks = await splitter.splitText(text);
const model = new ChatOpenAI({ model: "gpt-4o-mini", temperature: 0 });
const summarizePrompt = PromptTemplate.fromTemplate(
"Summarize this chunk in bullet points:\n\n{chunk}"
);
const combinePrompt = PromptTemplate.fromTemplate(
"Combine these summaries into one concise answer:\n\n{summaries}"
);
const partials = [];
for (const chunk of chunks) {
const formatted = await summarizePrompt.format({ chunk });
const response = await model.invoke(formatted);
partials.push(response.content.toString());
}
const combinedInput = partials.join("\n\n");
const finalPrompt = await combinePrompt.format({ summaries: combinedInput });
const finalResponse = await model.invoke(finalPrompt);
console.log(finalResponse.content.toString());
}
main().catch(console.error);
- •Wrap it into a reusable helper so you can point it at any long text file. In production, this is where you add retries, logging, token budgeting, and support for PDFs or database records.
import { readFile } from "node:fs/promises";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";
async function summarizeLongDocument(path: string) {
const text = await readFile(path, "utf-8");
//
}
Testing It
Run the script with tsx and point it at a real document that has at least several thousand words. You should see the number of chunks printed first, then either per-chunk summaries or one final combined summary depending on which step you ran.
If the model errors out, check that OPENAI_API_KEY is set in the same shell session and that your file path is correct. If the output feels repetitive, reduce overlap slightly or increase chunkSize so each summary has more distinct content.
For a better test, use a policy document or internal report and ask a question that requires details spread across multiple sections. If the final answer misses context, your chunks are probably too large or your combine prompt is too vague.
Next Steps
- •Add PDF loading with
PDFLoaderso you can process real business documents. - •Replace the manual loop with LangChain’s map-reduce summarization chain patterns.
- •Add metadata per chunk so you can trace answers back to source sections later.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit