AutoGen Tutorial (Python): handling long documents for beginners

By Cyprian AaronsUpdated 2026-04-21

autogenhandling-long-documents-for-beginnerspython

This tutorial shows you how to take a long document, split it into manageable chunks, and use AutoGen agents to answer questions over it without blowing past model context limits. You need this when your source material is too large for a single prompt: policies, contracts, claims notes, medical records, or long internal docs.

What You'll Need

•Python 3.10+
•pyautogen
•An OpenAI-compatible model endpoint or OpenAI API key
•A .env file or shell environment variable for your API key
•A long text file to test with, like policy.txt or manual.txt

Install the package:

pip install pyautogen

Set your key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

•Start by loading a long document from disk and splitting it into overlapping chunks. Overlap matters because important facts often sit near chunk boundaries.

from pathlib import Path

def chunk_text(text: str, chunk_size: int = 1200, overlap: int = 200):
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start = max(end - overlap, start + 1)
    return chunks

doc_path = Path("policy.txt")
document = doc_path.read_text(encoding="utf-8")
chunks = chunk_text(document)

print(f"Loaded {len(document)} characters")
print(f"Created {len(chunks)} chunks")
print(chunks[0][:300])

•Next, create a summarizer agent that will compress each chunk into a short factual summary. Keep the prompt strict so the summaries stay useful for retrieval later.

import os
from autogen import AssistantAgent

llm_config = {
    "config_list": [
        {
            "model": "gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
        }
    ],
    "temperature": 0,
}

summarizer = AssistantAgent(
    name="summarizer",
    llm_config=llm_config,
)

def summarize_chunk(chunk: str) -> str:
    prompt = (
        "Summarize this document chunk in 5 bullet points.\n"
        "Keep names, dates, limits, obligations, and exceptions.\n\n"
        f"{chunk}"
    )
    return summarizer.generate_reply(messages=[{"role": "user", "content": prompt}])

summary = summarize_chunk(chunks[0])
print(summary)

•Now build a simple map step over all chunks and store the summaries with their source indexes. This gives you a lightweight index you can search before asking the final question.

summaries = []

for i, chunk in enumerate(chunks):
    s = summarize_chunk(chunk)
    summaries.append({"chunk_id": i, "summary": s})

for item in summaries[:3]:
    print("=" * 40)
    print(f"Chunk {item['chunk_id']}")
    print(item["summary"])

•After that, create a helper that picks the most relevant summaries for a user question. For beginners, a keyword overlap filter is enough to prove the pattern before adding embeddings later.

import re

def score_summary(question: str, summary: str) -> int:
    q_words = set(re.findall(r"\w+", question.lower()))
    s_words = set(re.findall(r"\w+", summary.lower()))
    return len(q_words & s_words)

def retrieve_top_chunks(question: str, top_k: int = 3):
    ranked = sorted(
        summaries,
        key=lambda x: score_summary(question, x["summary"]),
        reverse=True,
    )
    return ranked[:top_k]

question = "What are the claim filing deadlines?"
top_chunks = retrieve_top_chunks(question)

for item in top_chunks:
    print("=" * 40)
    print(f"Chunk {item['chunk_id']}")
    print(item["summary"])

•Finally, use a second agent to answer the user’s question from only the retrieved summaries and source text. This keeps the final context small and makes the workflow usable on long documents.

answer_agent = AssistantAgent(
    name="answer_agent",
    llm_config=llm_config,
)

def answer_question(question: str) -> str:
    relevant = retrieve_top_chunks(question)
    context = "\n\n".join(
        f"[Chunk {item['chunk_id']} Summary]\n{item['summary']}"
        for item in relevant
    )

    prompt = (
        "Answer the question using only the provided chunk summaries.\n"
        "If the answer is not present, say you cannot find it in the document.\n\n"
        f"Question: {question}\n\n"
        f"Context:\n{context}"
    )
    return answer_agent.generate_reply(messages=[{"role": "user", "content": prompt}])

print(answer_question("What are the claim filing deadlines?"))

Testing It

Run the script against a real long document and ask three types of questions: one about a known fact near the beginning, one near the end, and one that requires combining multiple sections. If your retrieval is working, the returned summaries should clearly include terms from the question before the final answer is generated.

Check that answers stay grounded in the document and that missing information returns an explicit “cannot find it” response instead of hallucinated details. If results are weak, reduce chunk size slightly or increase overlap so boundary facts are preserved.

For production use, log chunk_id, selected summaries, and final answers so you can trace where each response came from.

Next Steps

•Replace keyword scoring with embeddings-based retrieval using FAISS or Chroma.
•Add a validation agent that checks whether an answer is supported by source text.
•Extend this pattern to multi-document Q&A with per-document metadata and filters

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit