AutoGen Tutorial (Python): handling long documents for beginners
This tutorial shows you how to take a long document, split it into manageable chunks, and use AutoGen agents to answer questions over it without blowing past model context limits. You need this when your source material is too large for a single prompt: policies, contracts, claims notes, medical records, or long internal docs.
What You'll Need
- •Python 3.10+
- •
pyautogen - •An OpenAI-compatible model endpoint or OpenAI API key
- •A
.envfile or shell environment variable for your API key - •A long text file to test with, like
policy.txtormanual.txt
Install the package:
pip install pyautogen
Set your key:
export OPENAI_API_KEY="your-key-here"
Step-by-Step
- •Start by loading a long document from disk and splitting it into overlapping chunks. Overlap matters because important facts often sit near chunk boundaries.
from pathlib import Path
def chunk_text(text: str, chunk_size: int = 1200, overlap: int = 200):
chunks = []
start = 0
while start < len(text):
end = start + chunk_size
chunks.append(text[start:end])
start = max(end - overlap, start + 1)
return chunks
doc_path = Path("policy.txt")
document = doc_path.read_text(encoding="utf-8")
chunks = chunk_text(document)
print(f"Loaded {len(document)} characters")
print(f"Created {len(chunks)} chunks")
print(chunks[0][:300])
- •Next, create a summarizer agent that will compress each chunk into a short factual summary. Keep the prompt strict so the summaries stay useful for retrieval later.
import os
from autogen import AssistantAgent
llm_config = {
"config_list": [
{
"model": "gpt-4o-mini",
"api_key": os.environ["OPENAI_API_KEY"],
}
],
"temperature": 0,
}
summarizer = AssistantAgent(
name="summarizer",
llm_config=llm_config,
)
def summarize_chunk(chunk: str) -> str:
prompt = (
"Summarize this document chunk in 5 bullet points.\n"
"Keep names, dates, limits, obligations, and exceptions.\n\n"
f"{chunk}"
)
return summarizer.generate_reply(messages=[{"role": "user", "content": prompt}])
summary = summarize_chunk(chunks[0])
print(summary)
- •Now build a simple map step over all chunks and store the summaries with their source indexes. This gives you a lightweight index you can search before asking the final question.
summaries = []
for i, chunk in enumerate(chunks):
s = summarize_chunk(chunk)
summaries.append({"chunk_id": i, "summary": s})
for item in summaries[:3]:
print("=" * 40)
print(f"Chunk {item['chunk_id']}")
print(item["summary"])
- •After that, create a helper that picks the most relevant summaries for a user question. For beginners, a keyword overlap filter is enough to prove the pattern before adding embeddings later.
import re
def score_summary(question: str, summary: str) -> int:
q_words = set(re.findall(r"\w+", question.lower()))
s_words = set(re.findall(r"\w+", summary.lower()))
return len(q_words & s_words)
def retrieve_top_chunks(question: str, top_k: int = 3):
ranked = sorted(
summaries,
key=lambda x: score_summary(question, x["summary"]),
reverse=True,
)
return ranked[:top_k]
question = "What are the claim filing deadlines?"
top_chunks = retrieve_top_chunks(question)
for item in top_chunks:
print("=" * 40)
print(f"Chunk {item['chunk_id']}")
print(item["summary"])
- •Finally, use a second agent to answer the user’s question from only the retrieved summaries and source text. This keeps the final context small and makes the workflow usable on long documents.
answer_agent = AssistantAgent(
name="answer_agent",
llm_config=llm_config,
)
def answer_question(question: str) -> str:
relevant = retrieve_top_chunks(question)
context = "\n\n".join(
f"[Chunk {item['chunk_id']} Summary]\n{item['summary']}"
for item in relevant
)
prompt = (
"Answer the question using only the provided chunk summaries.\n"
"If the answer is not present, say you cannot find it in the document.\n\n"
f"Question: {question}\n\n"
f"Context:\n{context}"
)
return answer_agent.generate_reply(messages=[{"role": "user", "content": prompt}])
print(answer_question("What are the claim filing deadlines?"))
Testing It
Run the script against a real long document and ask three types of questions: one about a known fact near the beginning, one near the end, and one that requires combining multiple sections. If your retrieval is working, the returned summaries should clearly include terms from the question before the final answer is generated.
Check that answers stay grounded in the document and that missing information returns an explicit “cannot find it” response instead of hallucinated details. If results are weak, reduce chunk size slightly or increase overlap so boundary facts are preserved.
For production use, log chunk_id, selected summaries, and final answers so you can trace where each response came from.
Next Steps
- •Replace keyword scoring with embeddings-based retrieval using FAISS or Chroma.
- •Add a validation agent that checks whether an answer is supported by source text.
- •Extend this pattern to multi-document Q&A with per-document metadata and filters
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit