LangGraph Tutorial (TypeScript): handling long documents for beginners
This tutorial shows you how to build a LangGraph workflow in TypeScript that takes a long document, splits it into chunks, summarizes each chunk, and combines the results into one usable output. You need this when a single prompt is too large for your model context window or when you want more reliable processing over long PDFs, contracts, policies, or reports.
What You'll Need
- •Node.js 18+
- •A TypeScript project with
tsconfig.json - •
langgraphpackage - •
@langchain/openaipackage - •An OpenAI API key in
OPENAI_API_KEY - •Basic familiarity with async/await and TypeScript types
Install the dependencies:
npm install langgraph @langchain/openai
npm install -D typescript tsx @types/node
Step-by-Step
- •Start by defining the state your graph will carry. For long documents, keep the raw text, chunk list, per-chunk summaries, and final summary in one typed object.
import { Annotation, StateGraph, START, END } from "langgraph";
import { ChatOpenAI } from "@langchain/openai";
const DocumentState = Annotation.Root({
document: Annotation<string>(),
chunks: Annotation<string[]>(),
summaries: Annotation<string[]>(),
finalSummary: Annotation<string>(),
});
type DocumentStateType = typeof DocumentState.State;
const model = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0,
});
- •Next, add a simple chunking function. This example splits on paragraphs and groups them into chunks that are easier to send through the model without blowing up token usage.
function chunkDocument(text: string, maxChars = 3000): string[] {
const paragraphs = text.split(/\n\s*\n/).filter(Boolean);
const chunks: string[] = [];
let current = "";
for (const paragraph of paragraphs) {
if ((current + "\n\n" + paragraph).length > maxChars && current) {
chunks.push(current);
current = paragraph;
} else {
current = current ? `${current}\n\n${paragraph}` : paragraph;
}
}
if (current) chunks.push(current);
return chunks;
}
- •Now create nodes for splitting, summarizing each chunk, and combining summaries. The important pattern here is that the graph first prepares work, then processes each chunk independently, then merges the results.
const splitNode = async (state: DocumentStateType) => {
return { chunks: chunkDocument(state.document) };
};
const summarizeChunkNode = async (state: DocumentStateType) => {
const summaries: string[] = [];
for (const chunk of state.chunks) {
const response = await model.invoke([
["system", "Summarize this document chunk clearly and concisely."],
["user", chunk],
]);
summaries.push(response.content.toString());
}
return { summaries };
};
const combineNode = async (state: DocumentStateType) => {
const response = await model.invoke([
["system", "Combine these chunk summaries into one coherent summary."],
["user", state.summaries.join("\n\n")],
]);
return { finalSummary: response.content.toString() };
};
- •Wire the nodes together with LangGraph. This version is intentionally simple for beginners: one path in, one path out, with no branching logic yet.
const graph = new StateGraph(DocumentState)
.addNode("split", splitNode)
.addNode("summarize", summarizeChunkNode)
.addNode("combine", combineNode)
.addEdge(START, "split")
.addEdge("split", "summarize")
.addEdge("summarize", "combine")
.addEdge("combine", END);
export const app = graph.compile();
- •Finally, run the graph against a long document. In production you would read from a file or database; for now use a multiline string so you can test the flow immediately.
async function main() {
const documentText = `
LangGraph helps you build stateful LLM applications.
When documents get long, you should not send everything in one request.
Instead, split the content into smaller parts and process them step by step.
This pattern works well for contracts, insurance policies, incident reports,
and internal knowledge base articles.
`;
const result = await app.invoke({ document: documentText });
console.log(result.finalSummary);
}
main().catch(console.error);
Testing It
Run the file with tsx:
npx tsx src/index.ts
You should see a final summary printed to the terminal instead of an error about context length. If you want to verify chunking behavior, log state.chunks.length inside splitNode and confirm longer inputs produce multiple chunks. Test with a real policy or contract excerpt next; if the output stays coherent across several pages of text, your graph is doing its job.
Next Steps
- •Add file loading so the graph can read
.txt,.md, or extracted PDF text - •Replace sequential chunk summarization with parallel processing using LangGraph fan-out patterns
- •Add a validation node that checks whether each summary preserves key entities like dates, amounts, and obligations
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit