AutoGen Tutorial (Python): chunking large documents for intermediate developers
This tutorial shows you how to split a large document into manageable chunks, summarize each chunk with AutoGen, and then combine those summaries into a final result. You need this when the model context window is too small for the full document, or when you want more stable extraction from long policy docs, contracts, or reports.
What You'll Need
- •Python 3.10+
- •
pyautogeninstalled - •An OpenAI-compatible API key
- •A text file or long string to process
- •Basic familiarity with AutoGen agents and
AssistantAgent
Step-by-Step
- •Start by installing AutoGen and setting your API key. I’m using the OpenAI-compatible client config because it works cleanly with current AutoGen setups.
pip install pyautogen
export OPENAI_API_KEY="your-key-here"
- •Load a long document and split it into chunks. For production, chunk by paragraph boundaries first, then cap by approximate character count so you don’t break sentences mid-way unless you have to.
from pathlib import Path
def chunk_text(text: str, max_chars: int = 4000):
paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]
chunks = []
current = []
for para in paragraphs:
candidate = "\n\n".join(current + [para])
if len(candidate) <= max_chars:
current.append(para)
else:
if current:
chunks.append("\n\n".join(current))
current = [para]
if current:
chunks.append("\n\n".join(current))
return chunks
document = Path("large_document.txt").read_text(encoding="utf-8")
chunks = chunk_text(document, max_chars=4000)
print(f"Loaded {len(chunks)} chunks")
- •Create an AutoGen assistant that summarizes each chunk consistently. Keep the prompt strict so every chunk returns structured output you can merge later.
import os
from autogen import AssistantAgent
llm_config = {
"config_list": [
{
"model": "gpt-4o-mini",
"api_key": os.environ["OPENAI_API_KEY"],
}
],
"temperature": 0,
}
summarizer = AssistantAgent(
name="summarizer",
llm_config=llm_config,
system_message=(
"You summarize document chunks for downstream merging. "
"Return concise bullet points covering facts, obligations, dates, risks, and named entities."
),
)
- •Summarize each chunk one by one. In real systems this is where you’d add retries, logging, and rate-limit handling, but the core pattern stays the same.
def summarize_chunk(agent: AssistantAgent, chunk: str) -> str:
message = (
"Summarize the following document chunk.\n"
"Use bullets only.\n\n"
f"{chunk}"
)
response = agent.generate_reply(messages=[{"role": "user", "content": message}])
return response if isinstance(response, str) else response["content"]
chunk_summaries = []
for i, chunk in enumerate(chunks, start=1):
summary = summarize_chunk(summarizer, chunk)
chunk_summaries.append(f"Chunk {i}:\n{summary}")
print(f"Summarized chunk {i}/{len(chunks)}")
- •Merge the chunk summaries into a final answer with a second pass. This is the part people skip, but it’s what turns a pile of local summaries into something useful.
merger = AssistantAgent(
name="merger",
llm_config=llm_config,
system_message=(
"You merge multiple chunk summaries into one coherent final summary. "
"Remove duplicates, preserve important specifics, and group related points."
),
)
merged_input = "\n\n".join(chunk_summaries)
final_summary = merger.generate_reply(
messages=[
{
"role": "user",
"content": (
"Merge these chunk summaries into a single executive summary.\n"
"Keep it structured with headings:\n"
"- Overview\n- Key Facts\n- Risks / Issues\n- Open Questions\n\n"
f"{merged_input}"
),
}
]
)
print(final_summary if isinstance(final_summary, str) else final_summary["content"])
- •If you need better quality on very long inputs, add overlap between adjacent chunks. That helps preserve context across section boundaries where details often get split.
def overlapping_chunks(text: str, max_chars: int = 4000, overlap_chars: int = 500):
paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]
chunks = []
current = ""
for para in paragraphs:
candidate = (current + "\n\n" + para).strip() if current else para
if len(candidate) <= max_chars:
current = candidate
else:
if current:
chunks.append(current)
current = current[-overlap_chars:] + "\n\n" + para
else:
chunks.append(para[:max_chars])
current = ""
if current:
chunks.append(current)
return chunks
Testing It
Run the script against a real document that is longer than your model’s context window. A good test file is a policy PDF converted to text or a multi-page internal report with repeated references across sections.
Check three things:
- •Every chunk produces a non-empty summary
- •The merged output removes duplicate points instead of repeating them
- •Important details like dates, obligations, and exceptions survive the merge
If the final summary feels vague, reduce chunk size or increase overlap slightly. If it misses cross-section references, your boundaries are probably too aggressive.
Next Steps
- •Add JSON schema-style output from each summarization pass so merging becomes deterministic
- •Replace sequential processing with
asyncioor worker pools for throughput on large corpora - •Add retrieval so you only summarize chunks relevant to a user query instead of every page
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit