CrewAI Tutorial (TypeScript): chunking large documents for beginners
This tutorial shows you how to split a large document into smaller chunks in TypeScript, send each chunk through CrewAI, and collect the results in a predictable way. You need this when a single document is too large for one model call, or when you want better control over cost, latency, and output quality.
What You'll Need
- •Node.js 18+
- •A TypeScript project with
ts-nodeor a build step - •
crewaiinstalled in your project - •An OpenAI API key set as
OPENAI_API_KEY - •A large text file to process, like a policy, contract, or report
- •Basic familiarity with CrewAI agents, tasks, and crews
Step-by-Step
- •Start by installing the packages and setting up a minimal TypeScript project. For this example, we’ll use
crewaiplus a simple tokenizer-free chunker based on character count so the code stays executable without extra dependencies.
npm init -y
npm install crewai dotenv
npm install -D typescript ts-node @types/node
- •Create a chunking helper that splits text into overlapping chunks. Overlap matters because important context often sits near chunk boundaries, and without it your summaries will miss links between sections.
// chunk.ts
export function chunkText(
text: string,
chunkSize = 3000,
overlap = 300
): string[] {
const chunks: string[] = [];
let start = 0;
while (start < text.length) {
const end = Math.min(start + chunkSize, text.length);
chunks.push(text.slice(start, end));
if (end === text.length) break;
start = Math.max(0, end - overlap);
}
return chunks;
}
- •Load the document from disk and inspect how many chunks you created. In production, I always log chunk counts and sizes before calling the model so I can catch bad inputs early.
// index.ts
import "dotenv/config";
import { readFile } from "node:fs/promises";
import { chunkText } from "./chunk";
async function main() {
const raw = await readFile("./document.txt", "utf8");
const chunks = chunkText(raw, 3000, 300);
console.log(`Loaded ${raw.length} characters`);
console.log(`Created ${chunks.length} chunks`);
console.log(chunks.map((c, i) => ({ index: i + 1, chars: c.length })));
}
main().catch(console.error);
- •Define a CrewAI agent and task that summarizes one chunk at a time. This keeps the prompt narrow and makes each output easier to validate than trying to summarize the whole document in one shot.
import { Agent, Task } from "crewai";
const chunkSummarizer = new Agent({
role: "Document Analyst",
goal: "Summarize one document chunk accurately",
backstory: "You extract concise factual summaries from long business documents.",
});
function buildTask(chunk: string) {
return new Task({
description: `Summarize this document chunk in 5 bullet points.\n\nCHUNK:\n${chunk}`,
expected_output: "A concise bullet-point summary of the chunk.",
agent: chunkSummarizer,
});
}
- •Run each chunk through CrewAI sequentially and combine the results. Sequential processing is simpler for beginners and easier to debug; once it works, you can move to parallel execution if your workload needs it.
import { Crew } from "crewai";
async function summarizeChunks(chunks: string[]) {
const summaries: string[] = [];
for (const [index, chunk] of chunks.entries()) {
const crew = new Crew({
agents: [chunkSummarizer],
tasks: [buildTask(chunk)],
verbose: true,
});
const result = await crew.kickoff();
summaries.push(`Chunk ${index + 1}\n${String(result)}`);
}
return summaries.join("\n\n");
}
- •Put it together in one runnable script and print the final combined summary. If your document is very large, this pattern becomes the base for map-reduce style workflows where you summarize chunks first and then summarize those summaries again.
import "dotenv/config";
import { readFile } from "node:fs/promises";
import { Agent, Crew, Task } from "crewai";
import { chunkText } from "./chunk";
const chunkSummarizer = new Agent({
role: "Document Analyst",
goal: "Summarize one document chunk accurately",
backstory: "You extract concise factual summaries from long business documents.",
});
function buildTask(chunk: string) {
return new Task({
description: `Summarize this document chunk in 5 bullet points.\n\nCHUNK:\n${chunk}`,
expected_output: "A concise bullet-point summary of the chunk.",
agent: chunkSummarizer,
});
}
async function main() {
const raw = await readFile("./document.txt", "utf8");
const chunks = chunkText(raw, 3000, 300);
const summaries: string[] = [];
for (const [index, chunk] of chunks.entries()) {
const crew = new Crew({
agents: [chunkSummarizer],
tasks: [buildTask(chunk)],
verbose: true,
});
const result = await crew.kickoff();
summaries.push(`Chunk ${index + 1}\n${String(result)}`);
console.log(`Finished chunk ${index + 1}/${chunks.length}`);
}
console.log("\n=== FINAL SUMMARY ===\n");
console.log(summaries.join("\n\n"));
}
main().catch(console.error);
Testing It
Run the script against a real text file first; don’t test with tiny lorem ipsum because small inputs hide boundary issues. Use a document that’s at least several pages long so you can see whether overlapping chunks preserve context across section breaks.
Watch for three things:
- •The number of generated chunks matches your expectation.
- •Each task completes without context-length errors.
- •The combined output stays consistent across repeated runs.
If outputs look noisy or repetitive, reduce chunkSize, increase overlap slightly, or tighten the prompt to force more structured summaries. For regulated content like policies or claims docs, I also recommend saving each raw per-chunk output before generating any final rollup so you can audit what happened later.
Next Steps
- •Add metadata to each chunk with page numbers or section titles before sending it to CrewAI.
- •Replace sequential execution with parallel processing when you need throughput.
- •Build a second-pass “summary of summaries” crew for full-document synthesis.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit