CrewAI Tutorial (Python): handling long documents for beginners

By Cyprian AaronsUpdated 2026-04-21

crewaihandling-long-documents-for-beginnerspython

This tutorial shows you how to take a long document, split it into manageable chunks, and process those chunks with CrewAI agents in Python. You need this when a single file is too large for one model call, or when you want a more reliable pipeline for summarizing, extracting, or answering questions from long text.

What You'll Need

•Python 3.10+
•crewai
•crewai-tools
•langchain-openai
•An OpenAI API key set as OPENAI_API_KEY
•A long text file to test with, for example document.txt

Install the packages:

pip install crewai crewai-tools langchain-openai

Step-by-Step

•Start by loading your long document from disk and splitting it into chunks. For beginners, a simple character-based splitter is enough to get the pattern right before moving to smarter chunking.

from pathlib import Path

def load_document(path: str) -> str:
    return Path(path).read_text(encoding="utf-8")

def split_text(text: str, chunk_size: int = 4000) -> list[str]:
    return [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]

document = load_document("document.txt")
chunks = split_text(document)

print(f"Loaded {len(document)} characters")
print(f"Created {len(chunks)} chunks")

•Next, define one agent that summarizes each chunk and another that combines those summaries into a final answer. This keeps each model call small and makes the workflow easier to debug.

from crewai import Agent

chunk_summarizer = Agent(
    role="Document Chunk Summarizer",
    goal="Summarize a chunk of a long document accurately and concisely",
    backstory="You extract the key points from a single section of a larger document.",
    verbose=True,
)

final_summarizer = Agent(
    role="Document Synthesizer",
    goal="Combine multiple chunk summaries into one coherent final summary",
    backstory="You merge partial summaries into a clear executive summary.",
    verbose=True,
)

•Now create tasks for each chunk and run them as a crew. The important part here is that each task only sees one slice of the document, so you avoid context overflow.

from crewai import Task, Crew, Process

tasks = []
for i, chunk in enumerate(chunks[:3]):  # keep first 3 chunks for the demo
    tasks.append(
        Task(
            description=f"Summarize chunk {i+1} of the document:\n\n{chunk}",
            expected_output="A concise bullet summary with the main ideas and any important facts.",
            agent=chunk_summarizer,
        )
    )

crew = Crew(
    agents=[chunk_summarizer],
    tasks=tasks,
    process=Process.sequential,
    verbose=True,
)

chunk_summaries = crew.kickoff()
print(chunk_summaries)

•After that, feed the collected summaries into a second task that produces the final result. This is the simplest production pattern for long documents: map over chunks first, then reduce into one answer.

from crewai import Task, Crew, Process

reduce_task = Task(
    description=f"""
Combine these chunk summaries into one final summary:

{chunk_summaries}
""".strip(),
    expected_output="A polished final summary with the main themes and any repeated points removed.",
    agent=final_summarizer,
)

reduce_crew = Crew(
    agents=[final_summarizer],
    tasks=[reduce_task],
    process=Process.sequential,
    verbose=True,
)

final_summary = reduce_crew.kickoff()
print(final_summary)

•If you want this to work on arbitrary documents, wrap the pipeline in a function. That gives you a reusable entry point for summarization, extraction, or later question-answering workflows.

from crewai import Agent, Task, Crew, Process

def summarize_long_document(path: str) -> str:
    text = load_document(path)
    chunks = split_text(text)

    summarizer = Agent(
        role="Document Chunk Summarizer",
        goal="Summarize document chunks accurately",
        backstory="You work on one chunk at a time.",
        verbose=False,
    )

    tasks = [
        Task(
            description=f"Summarize this chunk:\n\n{chunk}",
            expected_output="Bullet points capturing key facts.",
            agent=summarizer,
        )
        for chunk in chunks
    ]

    crew = Crew(
        agents=[summarizer],
        tasks=tasks,
        process=Process.sequential,
        verbose=False,
    )

    return str(crew.kickoff())

print(summarize_long_document("document.txt"))

Testing It

Run the script against a document that is clearly longer than your model’s comfortable context window. A good test file has repeated sections, headings, or multiple topics so you can see whether chunking preserves meaning.

Check that each chunk produces a summary instead of failing with token limits or truncated output. Then inspect the final combined summary and confirm it removes duplication while keeping the main points intact.

If results feel noisy, reduce chunk_size or make your summarizer prompt more specific. If results miss details, increase overlap between chunks later when you move beyond this beginner version.

Next Steps

•Add overlapping chunks so important sentences near boundaries are not lost.
•Replace character splitting with RecursiveCharacterTextSplitter from LangChain for better paragraph-aware chunking.
•Extend the same pattern to document Q&A by storing chunk embeddings in a vector database and retrieving only relevant sections before calling CrewAI.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit