AutoGen Tutorial (Python): handling long documents for advanced developers

By Cyprian AaronsUpdated 2026-04-21
autogenhandling-long-documents-for-advanced-developerspython

This tutorial shows you how to build an AutoGen workflow that can ingest, chunk, summarize, and answer questions over long documents in Python. You need this when a document is too large for a single model context window, but you still want precise answers grounded in the source text.

What You'll Need

  • Python 3.10+
  • pyautogen installed
  • An OpenAI API key set in your environment
  • A long text document saved locally as .txt
  • Basic familiarity with AutoGen agents and AssistantAgent / UserProxyAgent
pip install pyautogen
export OPENAI_API_KEY="your-api-key"

Step-by-Step

  1. Start by loading the document and splitting it into manageable chunks. For long documents, the key is to preserve local context while keeping each chunk small enough for reliable model processing.
from pathlib import Path

def load_text(path: str) -> str:
    return Path(path).read_text(encoding="utf-8")

def chunk_text(text: str, chunk_size: int = 3500, overlap: int = 300):
    chunks = []
    start = 0
    while start < len(text):
        end = min(start + chunk_size, len(text))
        chunks.append(text[start:end])
        start = end - overlap
        if start < 0:
            start = 0
    return chunks

document = load_text("long_document.txt")
chunks = chunk_text(document)
print(f"Loaded {len(document)} characters into {len(chunks)} chunks")
  1. Next, create a summarizer agent that turns each chunk into structured notes. Use a strict output format so later steps can merge results without guessing.
import os
from autogen import AssistantAgent

llm_config = {
    "model": "gpt-4o-mini",
    "api_key": os.environ["OPENAI_API_KEY"],
}

summarizer = AssistantAgent(
    name="summarizer",
    llm_config=llm_config,
)

summary_prompt = """
You summarize one document chunk for later retrieval.

Return:
- key_points: 3-5 bullets
- entities: people, systems, dates, numbers
- risks: any ambiguity or important caveats
Keep it concise and faithful to the text.
"""
  1. Run the summarizer over every chunk and store the results. This gives you a lightweight index of the document without pushing the full text into every prompt.
def summarize_chunks(agent, chunks):
    summaries = []
    for i, chunk in enumerate(chunks, start=1):
        message = f"{summary_prompt}\n\nCHUNK {i}:\n{chunk}"
        reply = agent.generate_reply(messages=[{"role": "user", "content": message}])
        summaries.append({"chunk_id": i, "summary": reply})
    return summaries

chunk_summaries = summarize_chunks(summarizer, chunks[:3])
for item in chunk_summaries:
    print(f"\n--- Chunk {item['chunk_id']} ---\n{item['summary']}")
  1. Build a question-answering agent that only sees the most relevant summaries plus the raw chunk text you select. In production, you would swap the simple selection logic here with embeddings or keyword search.
qa_agent = AssistantAgent(
    name="qa_agent",
    llm_config=llm_config,
)

question = "What are the main operational risks mentioned in the document?"

relevant_context = "\n\n".join(
    f"CHUNK {item['chunk_id']} SUMMARY:\n{item['summary']}"
    for item in chunk_summaries
)

qa_prompt = f"""
Answer the question using only the provided context.
If the answer is not supported by the context, say so.

QUESTION:
{question}

CONTEXT:
{relevant_context}
"""

answer = qa_agent.generate_reply(messages=[{"role": "user", "content": qa_prompt}])
print(answer)
  1. If you want better accuracy on specific questions, add a second pass that retrieves raw chunks before answering. This pattern works well when summaries are good for routing but not enough for exact wording.
def retrieve_chunks(chunks, query: str):
    keywords = [w.lower() for w in query.split() if len(w) > 4]
    scored = []
    for idx, chunk in enumerate(chunks, start=1):
        text = chunk.lower()
        score = sum(text.count(k) for k in keywords)
        scored.append((score, idx, chunk))
    scored.sort(reverse=True)
    return scored[:2]

top_chunks = retrieve_chunks(chunks[:3], question)
raw_context = "\n\n".join(
    f"CHUNK {idx} RAW TEXT:\n{chunk}"
    for _, idx, chunk in top_chunks
)

final_prompt = f"""
Answer precisely from these raw excerpts.
Question: {question}

{raw_context}
"""

final_answer = qa_agent.generate_reply(messages=[{"role": "user", "content": final_prompt}])
print(final_answer)

Testing It

Run the script against a real long document with at least several thousand words. First confirm that chunking produces multiple segments and that each summary is non-empty.

Then ask questions whose answers appear in different sections of the document so you can verify retrieval is pulling from more than one chunk. If the model starts hallucinating, tighten the prompt to require citations or add a stricter retrieval filter before generating an answer.

A good test is to compare answers from summaries alone versus answers from raw retrieved chunks. The raw-chunk pass should be more precise on names, dates, and policy language.

Next Steps

  • Replace keyword retrieval with embeddings using a vector store like FAISS or Chroma
  • Add citation tracking so every answer includes source chunk IDs
  • Turn this into a multi-agent pipeline with separate extractive and verification agents

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides