AutoGen Tutorial (Python): chunking large documents for advanced developers

By Cyprian AaronsUpdated 2026-04-21

autogenchunking-large-documents-for-advanced-developerspython

This tutorial shows you how to split large documents into token-safe chunks, summarize each chunk with AutoGen, and combine the results into a usable final output. You need this when a single document is too large for one model call, or when you want better control over cost, context limits, and retrieval quality.

What You'll Need

•Python 3.10+
•pyautogen installed
•An OpenAI API key exported as OPENAI_API_KEY
•
A document source:
- •plain text file, or
- •any string you can load into memory
•Basic familiarity with AutoGen agents and AssistantAgent

Step-by-Step

•Start by installing the package and setting up your environment. For this pattern, keep the document handling in plain Python and let AutoGen do the summarization work.

pip install pyautogen tiktoken
export OPENAI_API_KEY="your-api-key"

•Define a chunking function that splits text by token count instead of raw characters. Token-based chunking is what keeps you from blowing past model limits when documents contain long words, code blocks, or dense legal language.

from typing import List
import tiktoken

def chunk_text(text: str, max_tokens: int = 1200) -> List[str]:
    encoding = tiktoken.get_encoding("cl100k_base")
    tokens = encoding.encode(text)

    chunks = []
    for i in range(0, len(tokens), max_tokens):
        chunk_tokens = tokens[i:i + max_tokens]
        chunks.append(encoding.decode(chunk_tokens))

    return chunks

sample_text = "This is a long document. " * 2000
chunks = chunk_text(sample_text, max_tokens=500)
print(f"Created {len(chunks)} chunks")

•Create an AutoGen assistant that will summarize each chunk consistently. Use a stable system message so every chunk gets processed with the same instructions.

from autogen import AssistantAgent

llm_config = {
    "model": "gpt-4o-mini",
    "api_key": os.environ["OPENAI_API_KEY"],
}

summarizer = AssistantAgent(
    name="summarizer",
    llm_config=llm_config,
    system_message=(
        "You summarize document chunks for downstream synthesis. "
        "Keep important entities, dates, numbers, risks, and decisions."
    ),
)

•Run each chunk through the agent and collect summaries. In production, this is where you would add retries, logging, and rate-limit handling.

import os

def summarize_chunks(agent: AssistantAgent, chunks: list[str]) -> list[str]:
    summaries = []
    for idx, chunk in enumerate(chunks, start=1):
        prompt = (
            f"Summarize chunk {idx} of {len(chunks)}.\n\n"
            f"Focus on facts only:\n{chunk}"
        )
        response = agent.generate_reply(messages=[{"role": "user", "content": prompt}])
        summaries.append(response if isinstance(response, str) else response["content"])
    return summaries

chunk_summaries = summarize_chunks(summarizer, chunks[:3])
for s in chunk_summaries:
    print(s[:400], "\n---")

•Synthesize the chunk summaries into a final answer. This second pass gives you a clean result without forcing the model to read the entire source document at once.

def synthesize_summary(agent: AssistantAgent, summaries: list[str]) -> str:
    joined = "\n\n".join(
        f"Chunk summary {i+1}:\n{summary}" for i, summary in enumerate(summaries)
    )
    prompt = (
        "Combine these chunk summaries into one concise executive summary.\n"
        "Preserve critical details, conflicts, and unresolved items.\n\n"
        f"{joined}"
    )
    response = agent.generate_reply(messages=[{"role": "user", "content": prompt}])
    return response if isinstance(response, str) else response["content"]

final_summary = synthesize_summary(summarizer, chunk_summaries)
print(final_summary)

•If you need better quality on real enterprise documents, add overlap between chunks. Overlap reduces boundary loss when a key clause spans two adjacent chunks.

def chunk_text_with_overlap(text: str, max_tokens: int = 1200, overlap: int = 100) -> list[str]:
    encoding = tiktoken.get_encoding("cl100k_base")
    tokens = encoding.encode(text)

    chunks = []
    start = 0
    while start < len(tokens):
        end = min(start + max_tokens, len(tokens))
        chunks.append(encoding.decode(tokens[start:end]))
        start += max_tokens - overlap

    return chunks

overlapped_chunks = chunk_text_with_overlap(sample_text, max_tokens=500, overlap=50)
print(f"Created {len(overlapped_chunks)} overlapped chunks")

Testing It

Run the script against a real document that is larger than your model’s comfortable context window. A good test is a policy PDF converted to text or a long internal memo with sections and tables.

Check three things:

•The number of chunks looks reasonable for the document size.
•Each per-chunk summary preserves names, dates, obligations, and exceptions.
•The final synthesized output reads like one coherent summary instead of disconnected notes.

If the output starts dropping details at boundaries, increase overlap or reduce max_tokens. If summaries become too vague, tighten the system message and ask for structured output like bullets or JSON.

Next Steps

•Add metadata to each chunk so you can trace summaries back to source page numbers or section headers.
•Replace pure summarization with retrieval: embed chunks first, then only summarize the top relevant ones.
•Wrap this flow in an AutoGen multi-agent pipeline with a reviewer agent that checks factual consistency across chunks

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit