CrewAI Tutorial (TypeScript): handling long documents for beginners
This tutorial shows you how to take a long document, split it into manageable chunks, and process those chunks with CrewAI in TypeScript. You need this when a single prompt blows past model context limits or when you want more reliable extraction from contracts, reports, policies, or claims documents.
What You'll Need
- •Node.js 18+ installed
- •A TypeScript project with
ts-nodeortsx - •CrewAI for TypeScript installed in your project
- •An OpenAI API key set as an environment variable
- •A long text file to test with, such as
./data/document.txt - •Basic familiarity with async/await and TypeScript types
Step-by-Step
- •Install the packages and set up your project.
You need a small runtime wrapper plus the CrewAI package and a text splitter so you can chunk large documents before sending them to agents.
npm init -y
npm install @crew-ai/crew-ai dotenv
npm install -D typescript tsx @types/node
- •Create a simple document loader and chunker.
The important part is not “reading a file,” it’s splitting the text into chunks that preserve meaning without exceeding context limits.
// src/chunk-document.ts
import fs from "node:fs/promises";
export async function loadDocument(path: string): Promise<string> {
return fs.readFile(path, "utf8");
}
export function chunkText(text: string, chunkSize = 3000, overlap = 300): string[] {
const chunks: string[] = [];
let start = 0;
while (start < text.length) {
const end = Math.min(start + chunkSize, text.length);
chunks.push(text.slice(start, end));
start = end - overlap;
if (start < 0) start = 0;
if (end === text.length) break;
}
return chunks.filter((chunk) => chunk.trim().length > 0);
}
- •Define one agent for per-chunk extraction and another for final synthesis.
For long documents, don’t ask one agent to do everything at once. Use one worker agent to extract facts from each chunk, then a second agent to merge those facts into a clean summary.
// src/agents.ts
import { Agent } from "@crew-ai/crew-ai";
export const extractorAgent = new Agent({
name: "Chunk Extractor",
role: "Extract key facts from one document chunk",
goal: "Return concise bullet points with important facts only",
backstory: "You are precise and avoid adding unsupported claims.",
});
export const synthesizerAgent = new Agent({
name: "Document Synthesizer",
role: "Combine extracted facts into a final answer",
goal: "Produce a short, accurate summary of the full document",
backstory: "You reconcile overlapping points and remove duplicates.",
});
- •Create tasks that map each chunk to the extractor agent, then pass all results into the synthesizer.
This pattern is what makes long-document handling work in practice: parallel-ish extraction first, then controlled consolidation second.
// src/tasks.ts
import { Task } from "@crew-ai/crew-ai";
import { extractorAgent, synthesizerAgent } from "./agents";
export function buildExtractionTasks(chunks: string[]) {
return chunks.map(
(chunk, index) =>
new Task({
name: `Extract Chunk ${index + 1}`,
description: `Extract the most important facts from this document chunk:\n\n${chunk}`,
agent: extractorAgent,
})
);
}
export function buildSynthesisTask(extractions: string[]) {
return new Task({
name: "Synthesize Document",
description: `Combine these extracted notes into one final summary:\n\n${extractions.join("\n\n---\n\n")}`,
agent: synthesizerAgent,
});
}
- •Wire everything together in a runnable script.
This script loads the file, chunks it, runs extraction over each chunk, then sends the combined outputs into the final synthesis task.
// src/index.ts
import "dotenv/config";
import { Crew } from "@crew-ai/crew-ai";
import { loadDocument, chunkText } from "./chunk-document";
import { buildExtractionTasks, buildSynthesisTask } from "./tasks";
async function main() {
const raw = await loadDocument("./data/document.txt");
const chunks = chunkText(raw, 3000, 300);
const extractionTasks = buildExtractionTasks(chunks);
const extractionCrew = new Crew({
tasks: extractionTasks,
verbose: true,
});
const extractionsResult = await extractionCrew.kickoff();
const extractions = Array.isArray(extractionsResult)
? extractionsResult.map(String)
: [String(extractionsResult)];
const synthesisTask = buildSynthesisTask(extractions);
const synthesisCrew = new Crew({
tasks: [synthesisTask],
verbose: true,
});
const finalResult = await synthesisCrew.kickoff();
console.log("\nFINAL RESULT:\n");
console.log(String(finalResult));
}
main().catch((error) => {
console.error(error);
process.exit(1);
});
- •Add your environment variable and run it against a real document.
Keep the test document large enough to force multiple chunks; otherwise you’re not actually testing the long-document path.
# .env
OPENAI_API_KEY=your_api_key_here
# run
npx tsx src/index.ts
Testing It
Use a document that is clearly longer than one model prompt window for your chosen settings. A policy PDF converted to text or a multi-page insurance claim note works well.
Check that the console shows multiple extraction tasks before the final synthesis task runs. If you only see one chunk, reduce chunkSize until you get several pieces.
Verify that the final output does not repeat itself excessively and that it reflects details from across the whole document. If answers get vague, lower the chunk size or make the extractor task more specific about what facts to keep.
For production use, compare summaries against known source sections so you can catch missing details early.
Next Steps
- •Add metadata to each chunk so extracted notes include page numbers or section headers.
- •Replace plain text splitting with token-based splitting using a tokenizer-aware library.
- •Add structured outputs so each chunk returns JSON instead of free-form bullets.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit