AutoGen Tutorial (Python): chunking large documents for beginners

By Cyprian AaronsUpdated 2026-04-21

autogenchunking-large-documents-for-beginnerspython

This tutorial shows you how to split a large document into smaller chunks with Python and AutoGen, then process those chunks with an agent pipeline. You need this when a document is too large for a single model call, or when you want better retrieval, summarization, and per-section analysis.

What You'll Need

•Python 3.10+
•pyautogen installed
•An OpenAI API key
•A text file or long string you want to chunk
•Basic familiarity with creating an AutoGen AssistantAgent
•
Optional but useful:
- •tiktoken for token-aware chunking
- •python-dotenv for loading environment variables

Install the packages:

pip install pyautogen tiktoken python-dotenv

Set your API key:

export OPENAI_API_KEY="your-api-key"

Step-by-Step

•Start by loading your document and defining a simple chunking strategy. For beginners, character-based chunking is easier to reason about than token-based chunking, and it is good enough for most first implementations.

from pathlib import Path

def load_document(path: str) -> str:
    return Path(path).read_text(encoding="utf-8")

def chunk_text(text: str, chunk_size: int = 2000, overlap: int = 200):
    chunks = []
    start = 0

    while start < len(text):
        end = min(start + chunk_size, len(text))
        chunks.append(text[start:end])
        start = max(end - overlap, start + 1)

    return chunks

document = load_document("large_document.txt")
chunks = chunk_text(document)

print(f"Loaded {len(document)} characters")
print(f"Created {len(chunks)} chunks")
print(chunks[0][:300])

•Next, configure AutoGen with your model settings and create an assistant agent. This agent will process each chunk independently, which keeps prompts smaller and makes failures easier to debug.

import os
from autogen import AssistantAgent

llm_config = {
    "model": "gpt-4o-mini",
    "api_key": os.environ["OPENAI_API_KEY"],
}

chunk_processor = AssistantAgent(
    name="chunk_processor",
    llm_config=llm_config,
)

print("Agent ready")

•Now define a function that sends each chunk to the agent and asks for structured output. Keep the instruction narrow; do not ask the model to summarize the whole document yet, only the current chunk.

def summarize_chunk(agent: AssistantAgent, chunk: str, index: int) -> str:
    prompt = f"""
You are analyzing chunk {index + 1} of a larger document.

Return:
- a 1-sentence summary
- 3 bullet points of key facts
- any important names, dates, or numbers

Chunk:
{chunk}
""".strip()

    response = agent.generate_reply(messages=[{"role": "user", "content": prompt}])
    return response if isinstance(response, str) else response["content"]

chunk_summaries = []
for i, chunk in enumerate(chunks[:3]):
    summary = summarize_chunk(chunk_processor, chunk, i)
    chunk_summaries.append(summary)
    print(f"\n--- Chunk {i + 1} ---\n{summary}")

•After that, combine the per-chunk results into one final pass. This is where AutoGen becomes useful: you let the model reason over smaller summaries instead of the full raw document.

def combine_summaries(agent: AssistantAgent, summaries: list[str]) -> str:
    joined = "\n\n".join(
        f"Chunk summary {i + 1}:\n{s}" for i, s in enumerate(summaries)
    )

    prompt = f"""
You are given summaries from multiple chunks of one document.

Create:
- a concise overall summary
- the top 5 themes across all chunks
- any contradictions or repeated ideas

Summaries:
{joined}
""".strip()

    response = agent.generate_reply(messages=[{"role": "user", "content": prompt}])
    return response if isinstance(response, str) else response["content"]

final_summary = combine_summaries(chunk_processor, chunk_summaries)
print("\n=== Final Summary ===\n")
print(final_summary)

•If you want this to be more production-friendly, store metadata with each chunk. That makes it easier to trace answers back to source text later when you build retrieval or audit features.

def build_chunk_records(text: str, chunk_size: int = 2000, overlap: int = 200):
    records = []
    start = 0
    index = 0

    while start < len(text):
        end = min(start + chunk_size, len(text))
        records.append({
            "chunk_id": index,
            "start": start,
            "end": end,
            "text": text[start:end],
        })
        index += 1
        start = max(end - overlap, start + 1)

    return records

records = build_chunk_records(document)
print(records[0]["chunk_id"], records[0]["start"], records[0]["end"])

Testing It

Run the script against a real .txt file that is long enough to produce multiple chunks. Check that the number of chunks looks reasonable and that each summary only refers to its own section of text.

Then inspect the final combined summary for repeated themes and missing context. If the output feels noisy, reduce chunk_size, increase overlap, or tighten the prompt so each chunk produces more structured results.

A good sanity check is to search for a known phrase from the original document inside one of the generated summaries. If that phrase disappears entirely across all summaries, your chunks may be too large or your prompt may be too vague.

Next Steps

•Replace character-based splitting with token-based splitting using tiktoken
•Add an embedding step so you can retrieve relevant chunks before calling AutoGen
•Wrap this in an AutoGen multi-agent workflow with one agent summarizing and another validating outputs

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit