LangGraph Tutorial (TypeScript): chunking large documents for beginners

By Cyprian AaronsUpdated 2026-04-22
langgraphchunking-large-documents-for-beginnerstypescript

This tutorial shows how to build a LangGraph workflow in TypeScript that takes a large document, splits it into manageable chunks, and prepares those chunks for downstream processing. You need this when your input is too large for a single prompt, or when you want deterministic chunking before embedding, summarization, or extraction.

What You'll Need

  • Node.js 18+
  • TypeScript 5+
  • npm or pnpm
  • Packages:
    • @langchain/langgraph
    • @langchain/core
    • @langchain/textsplitters
    • zod
    • ts-node or tsx for running the script
  • An editor with TypeScript support
  • No API key is required for the chunking workflow in this tutorial

Step-by-Step

  1. Start by defining the graph state and the input/output shape. Keep it simple: one field for the raw document and one field for the resulting chunks.
import { Annotation, StateGraph, START, END } from "@langchain/langgraph";

const GraphState = Annotation.Root({
  document: Annotation<string>(),
  chunks: Annotation<string[]>({
    default: () => [],
    reducer: (_prev, next) => next,
  }),
});

type GraphStateType = typeof GraphState.State;
  1. Add a node that splits the document into overlapping chunks. For beginners, fixed-size chunks with overlap are easier to reason about than semantic splitting.
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";

const splitDocument = async (state: GraphStateType) => {
  const splitter = new RecursiveCharacterTextSplitter({
    chunkSize: 400,
    chunkOverlap: 50,
  });

  const chunks = await splitter.splitText(state.document);

  return { chunks };
};
  1. Build the LangGraph workflow and connect the start node to your chunking node. This gives you a reusable pipeline you can extend later with summarization, classification, or embedding steps.
const graph = new StateGraph(GraphState)
  .addNode("splitDocument", splitDocument)
  .addEdge(START, "splitDocument")
  .addEdge("splitDocument", END)
  .compile();
  1. Run the graph with a sample document and print the result. In production, this input would come from a file upload, database record, or OCR pipeline.
const largeDocument = `
LangGraph is useful when you need predictable control flow.
Large documents often exceed model context windows.
Chunking lets you process them in smaller pieces.
This is common in legal review, insurance claims, and bank policy analysis.
`.repeat(20);

const result = await graph.invoke({
  document: largeDocument,
});

console.log("Chunk count:", result.chunks.length);
console.log("First chunk:", result.chunks[0]);
  1. If you want better visibility during development, add basic validation before splitting. This prevents empty inputs from silently producing useless output.
const validateInput = async (state: GraphStateType) => {
  if (!state.document || state.document.trim().length === 0) {
    throw new Error("document cannot be empty");
  }

  return {};
};

const validatedGraph = new StateGraph(GraphState)
  .addNode("validateInput", validateInput)
  .addNode("splitDocument", splitDocument)
  .addEdge(START, "validateInput")
  .addEdge("validateInput", "splitDocument")
  .addEdge("splitDocument", END)
  .compile();

Testing It

Run the script with tsx or compile it with tsc and execute the output with Node. You should see a chunk count greater than zero and the first chunk printed to the console.

Check that each chunk stays near your configured size and that overlap exists between neighboring chunks. If you reduce chunkSize, you should see more chunks; if you increase it, you should see fewer.

Try an empty string as input and confirm the validation step throws an error. That tells you your graph fails fast instead of sending bad data downstream.

Next Steps

  • Add a second node that summarizes each chunk before storing it
  • Replace fixed-size splitting with metadata-aware splitting for PDFs or HTML
  • Persist chunk outputs to a vector database like pgvector or Pinecone

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides