AutoGen Tutorial (TypeScript): handling long documents for advanced developers

By Cyprian AaronsUpdated 2026-04-21
autogenhandling-long-documents-for-advanced-developerstypescript

This tutorial shows how to build a TypeScript AutoGen workflow that can ingest long documents without blowing past model context limits. You need this when you want reliable summarization, extraction, or Q&A over PDFs, policy docs, contracts, or knowledge base exports that are too large to send in one prompt.

What You'll Need

  • Node.js 18+
  • A TypeScript project with ts-node or a compiled build step
  • @autogenai/autogen installed
  • dotenv for environment variables
  • An OpenAI API key in OPENAI_API_KEY
  • A long text source file, for example ./docs/policy.txt

Install the packages:

npm install @autogenai/autogen dotenv
npm install -D typescript ts-node @types/node

Step-by-Step

  1. Start by loading your document and splitting it into token-safe chunks. For long-document work, chunking is not optional; it is the core control point that keeps your agent calls predictable.
import * as fs from "node:fs";
import * as path from "node:path";

const raw = fs.readFileSync(path.join(process.cwd(), "docs/policy.txt"), "utf8");

function chunkText(text: string, size = 4000): string[] {
  const chunks: string[] = [];
  for (let i = 0; i < text.length; i += size) {
    chunks.push(text.slice(i, i + size));
  }
  return chunks;
}

const chunks = chunkText(raw);
console.log(`Loaded ${chunks.length} chunks`);
  1. Create an AutoGen assistant configured for summarization and extraction. Keep the system message narrow so each chunk gets processed consistently instead of drifting into open-ended chat.
import "dotenv/config";
import { AssistantAgent } from "@autogenai/autogen";

const assistant = new AssistantAgent({
  name: "doc_assistant",
  model: "gpt-4o-mini",
  systemMessage:
    "You summarize document chunks precisely. Return JSON with keys: summary, key_points, risks.",
});

async function summarizeChunk(chunk: string) {
  const result = await assistant.run([
    { role: "user", content: `Summarize this document chunk:\n\n${chunk}` },
  ]);
  return result.messages.at(-1)?.content ?? "";
}
  1. Run a map phase over every chunk and collect structured outputs. This is the part that scales; each call stays within context limits while still preserving coverage across the full document.
async function main() {
  const partials: string[] = [];

  for (let i = 0; i < chunks.length; i++) {
    const content = await summarizeChunk(chunks[i]);
    partials.push(content);
    console.log(`Processed chunk ${i + 1}/${chunks.length}`);
  }

  console.log(partials.slice(0, 2).join("\n\n"));
}

main().catch(console.error);
  1. Add a reduce phase that merges the chunk summaries into one final answer. This is where you turn many small model calls into a single coherent output that can be handed to downstream systems.
async function mergeSummaries(partials: string[]) {
  const merger = new AssistantAgent({
    name: "merger",
    model: "gpt-4o-mini",
    systemMessage:
      "Merge chunk summaries into one concise report. Preserve important risks and duplicate only when necessary.",
  });

  const result = await merger.run([
    {
      role: "user",
      content: `Combine these chunk summaries into one final report:\n\n${partials.join("\n\n")}`,
    },
  ]);

  return result.messages.at(-1)?.content ?? "";
}

async function pipeline() {
  const partials: string[] = [];
  for (const chunk of chunks) partials.push(await summarizeChunk(chunk));
  const finalReport = await mergeSummaries(partials);
  console.log(finalReport);
}
  1. If you need retrieval-style behavior instead of full-document summarization, index the chunks by metadata before sending them to AutoGen. In practice, this lets you answer targeted questions without reprocessing the whole document every time.
type ChunkRecord = { id: number; text: string };

const indexedChunks: ChunkRecord[] = chunks.map((text, id) => ({ id, text }));

function findRelevantChunks(query: string): ChunkRecord[] {
  return indexedChunks.filter((c) =>
    c.text.toLowerCase().includes(query.toLowerCase().split(" ")[0] ?? "")
  );
}

async function answerQuestion(query: string) {
  const relevant = findRelevantChunks(query).slice(0, 3);
  const context = relevant.map((c) => `Chunk ${c.id + 1}:\n${c.text}`).join("\n\n");

  const result = await assistant.run([
    { role: "user", content: `Answer using only this context:\n\n${context}\n\nQuestion: ${query}` },
  ]);

  return result.messages.at(-1)?.content ?? "";
}

Testing It

Run the script against a real long document first, not a toy sample. Verify that each chunk produces output and that the merged report does not drop major sections like definitions, exceptions, or obligations.

Then test with a question that only appears in one part of the document and confirm your retrieval path finds it. If answers get vague, reduce chunk size or tighten the system message so the model stops overgeneralizing.

A good production check is to log token usage per call and watch for outliers. If one chunk consistently causes failures, it usually means your splitter is too coarse or your source has tables and formatting that need special handling.

Next Steps

  • Replace naive character splitting with token-aware splitting using your tokenizer of choice
  • Add embeddings-based retrieval so question answering scales beyond keyword matching
  • Store chunk summaries in a database so repeated document analysis does not recompute everything

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides