AutoGen Tutorial (Python): handling long documents for intermediate developers

By Cyprian AaronsUpdated 2026-04-21

autogenhandling-long-documents-for-intermediate-developerspython

This tutorial shows you how to build an AutoGen workflow that can ingest long documents, split them into manageable chunks, and answer questions without blowing past model context limits. You need this when a single PDF, policy manual, contract, or incident report is too large to send to the model in one shot.

What You'll Need

•Python 3.10+
•autogen package
•OpenAI API key
•tiktoken for token-aware chunking
•A long text document in .txt format for testing
•Basic familiarity with AutoGen agents and AssistantAgent / UserProxyAgent

Step-by-Step

•Start by setting up a local AutoGen configuration and a helper for loading text. For long documents, the key is not “send everything,” it’s “control what enters context.”

import os
from pathlib import Path

import autogen

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY", "")

config_list = [
    {
        "model": "gpt-4o-mini",
        "api_key": os.environ["OPENAI_API_KEY"],
    }
]

llm_config = {
    "config_list": config_list,
    "temperature": 0,
}

def load_document(path: str) -> str:
    return Path(path).read_text(encoding="utf-8")

document_text = load_document("long_document.txt")
print(f"Loaded {len(document_text)} characters")

•Next, split the document into chunks by token count instead of raw characters. This keeps each chunk closer to the model’s real input limit and avoids ugly truncation behavior.

import tiktoken

def chunk_text(text: str, max_tokens: int = 1200) -> list[str]:
    enc = tiktoken.get_encoding("cl100k_base")
    tokens = enc.encode(text)
    chunks = []

    for i in range(0, len(tokens), max_tokens):
        chunk_tokens = tokens[i : i + max_tokens]
        chunks.append(enc.decode(chunk_tokens))

    return chunks

chunks = chunk_text(document_text, max_tokens=1200)
print(f"Created {len(chunks)} chunks")
print(chunks[0][:500])

•Create a summarizer agent that processes each chunk independently. The output should be compact and structured so you can combine many summaries later without overloading context.

summarizer = autogen.AssistantAgent(
    name="summarizer",
    llm_config=llm_config,
    system_message=(
        "You summarize document chunks for later retrieval. "
        "Return concise bullet points with key facts, names, dates, obligations, risks, and definitions."
    ),
)

user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
)

def summarize_chunk(chunk: str) -> str:
    prompt = (
        "Summarize this document chunk.\n\n"
        f"{chunk}\n\n"
        "Format:\n"
        "- Key points\n"
        "- Entities\n"
        "- Dates/Numbers\n"
        "- Risks/Issues"
    )
    user_proxy.initiate_chat(summarizer, message=prompt)

•Instead of keeping every full chunk in memory during Q&A, build a lightweight summary index. For intermediate workflows, this is usually enough unless you need exact citation-level retrieval.

from dataclasses import dataclass

@dataclass
class ChunkSummary:
    index: int
    summary: str

summary_index: list[ChunkSummary] = []

for i, chunk in enumerate(chunks):
    result = summarize_chunk(chunk)
    summary_index.append(ChunkSummary(index=i, summary=result))

print(f"Stored {len(summary_index)} summaries")

•Finally, answer questions by selecting only the most relevant summaries and passing them into a second agent. This keeps the final prompt small while still grounding the answer in the document.

qa_agent = autogen.AssistantAgent(
    name="qa_agent",
    llm_config=llm_config,
    system_message=(
        "Answer questions using only the provided summaries. "
        "If the summaries do not contain enough information, say so clearly."
    ),
)

def answer_question(question: str) -> None:
    context = "\n\n".join(
        f"Chunk {item.index}:\n{item.summary}" for item in summary_index[:5]
    )
    prompt = (
        f"Question: {question}\n\n"
        f"Document summaries:\n{context}\n\n"
        "Give a direct answer and mention uncertainty if needed."
    )
    user_proxy.initiate_chat(qa_agent, message=prompt)

answer_question("What are the main contractual risks mentioned in the document?")

Testing It

Run the script against a real long .txt file that exceeds your model’s context window. First verify that chunking produces multiple pieces and that each summary is shorter than its source chunk.

Then ask a question that should be answered from only one or two sections of the document. If the answer is vague or missing details, reduce chunk size or improve your summarization prompt.

For production-style validation, compare answers against known facts from the source document and log which chunks were used. If you need more precision later, replace the naive summary_index[:5] selection with embedding-based retrieval.

Next Steps

•Add vector search with embeddings so you can retrieve only the most relevant chunks before answering.
•Store chunk summaries and metadata in SQLite or Postgres for repeatable document processing.
•Extend this pattern to PDFs by extracting text with pymupdf or pypdf before chunking

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit