LangGraph Tutorial (TypeScript): handling long documents for advanced developers

By Cyprian AaronsUpdated 2026-04-22
langgraphhandling-long-documents-for-advanced-developerstypescript

This tutorial shows you how to build a LangGraph workflow in TypeScript that can ingest long documents, chunk them, summarize each chunk, and synthesize a final answer from the summaries. You need this when the document is too large for a single model call, or when you want deterministic control over document processing instead of dumping everything into one prompt.

What You'll Need

  • Node.js 18+
  • A TypeScript project with ts-node or a build step
  • Packages:
    • langgraph
    • @langchain/openai
    • @langchain/core
    • zod
    • dotenv
  • An OpenAI API key in .env:
    • OPENAI_API_KEY=...
  • A long text file or any large string source you want to process

Step-by-Step

  1. Start by defining the graph state and the basic document splitting logic. For long-document handling, keep the state explicit: original text, chunks, per-chunk summaries, and final output.
import "dotenv/config";
import { ChatOpenAI } from "@langchain/openai";
import { Annotation, END, START, StateGraph } from "langgraph";

type DocState = {
  text: string;
  chunks: string[];
  summaries: string[];
  finalAnswer: string;
};

const State = Annotation.Root({
  text: Annotation<string>(),
  chunks: Annotation<string[]>({
    default: () => [],
    reducer: (_, next) => next,
  }),
  summaries: Annotation<string[]>({
    default: () => [],
    reducer: (_, next) => next,
  }),
  finalAnswer: Annotation<string>({
    default: () => "",
    reducer: (_, next) => next,
  }),
});

function splitText(text: string, chunkSize = 2000): string[] {
  const chunks: string[] = [];
  for (let i = 0; i < text.length; i += chunkSize) {
    chunks.push(text.slice(i, i + chunkSize));
  }
  return chunks;
}
  1. Create a model instance and a node that chunks the input text. This keeps the graph deterministic and makes it easy to swap in smarter splitting later if you need token-aware chunking.
const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0,
});

const chunkNode = async (state: typeof State.State) => {
  const chunks = splitText(state.text);
  return { chunks };
};
  1. Add a summarization node that processes all chunks in parallel. This is the main pattern for long documents in LangGraph: fan out work per chunk, then reduce into a smaller representation.
const summarizeNode = async (state: typeof State.State) => {
  const summaries = await Promise.all(
    state.chunks.map(async (chunk) => {
      const response = await model.invoke([
        {
          role: "system",
          content:
            "Summarize this document chunk for downstream synthesis. Keep facts, names, dates, and decisions.",
        },
        { role: "user", content: chunk },
      ]);
      return response.content.toString();
    })
  );

  return { summaries };
};
  1. Add a synthesis node that combines the summaries into one answer. This is where you turn many small outputs into something usable for search, QA, or extraction.
const synthesizeNode = async (state: typeof State.State) => {
  const response = await model.invoke([
    {
      role: "system",
      content:
        "You combine multiple chunk summaries into one concise but complete answer.",
    },
    {
      role: "user",
      content: state.summaries.map((s, i) => `Chunk ${i + 1}: ${s}`).join("\n\n"),
    },
  ]);

  return { finalAnswer: response.content.toString() };
};
  1. Wire the graph together and compile it. The flow is simple: start with text, split it, summarize each chunk, then synthesize the result.
const graph = new StateGraph(State)
  .addNode("chunk", chunkNode)
  .addNode("summarize", summarizeNode)
  .addNode("synthesize", synthesizeNode)
  .addEdge(START, "chunk")
  .addEdge("chunk", "summarize")
  .addEdge("summarize", "synthesize")
  .addEdge("synthesize", END);

const app = graph.compile();
  1. Run it against a long document and print the output. In production, this same pattern works for contracts, policy docs, claim files, and call transcripts.
async function main() {
  const longText = `
# Contract Notes

The insured party must submit notice within thirty days.
Coverage applies only after premium payment is confirmed.
Exclusions include intentional damage and fraudulent claims.

# Claims History

Claim A was filed on January 12 and closed on February 3.
Claim B was reopened due to missing documentation.
Claim C requires legal review because liability is disputed.
`.repeat(20);

  const result = await app.invoke({ text: longText });
  
  console.log("Chunks:", result.chunks.length);
  console.log("Summaries:", result.summaries.length);
}

main().catch(console.error);

Testing It

Run the script with your TypeScript runtime and confirm that the number of chunks is greater than one for a sufficiently large input. Then inspect the summaries to make sure they preserve important entities like dates, obligations, exclusions, or claim IDs.

If you want stronger verification, compare the final answer against known facts embedded in the source document. For bank or insurance use cases, I usually test with documents containing specific clauses and ask whether those clauses are reflected in the synthesized output.

You should also test edge cases:

  • very short documents
  • documents with repeated sections
  • documents with tables or bullet lists
  • malformed inputs like empty strings

Next Steps

  • Replace naive character splitting with token-aware splitting using LangChain text splitters
  • Add retrieval so you only summarize relevant sections before synthesis
  • Introduce checkpointing and resumability for multi-stage document pipelines

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides