CrewAI Tutorial (TypeScript): handling long documents for intermediate developers
This tutorial shows you how to ingest long documents in a CrewAI TypeScript project without blowing past model context limits. You’ll split the document into chunks, summarize each chunk with an agent, then combine those summaries into a final answer that stays grounded in the source text.
What You'll Need
- •Node.js 18+ and npm
- •A TypeScript project with
ts-nodeortsx - •
crewaiinstalled in your project - •An LLM provider API key, such as:
- •
OPENAI_API_KEY
- •
- •A long text file to test with, for example:
- •
./docs/policy.txt
- •
- •Basic familiarity with:
- •CrewAI agents, tasks, and crews
- •async/await in TypeScript
Step-by-Step
- •Start by installing the dependencies and setting up a small TypeScript project. For long-document work, keep your runtime simple and avoid extra orchestration libraries until the pipeline is stable.
npm init -y
npm install crewai dotenv
npm install -D typescript tsx @types/node
npx tsc --init
- •Create a document loader and chunker. The key idea is to keep chunks small enough for reliable summarization while preserving enough context to be useful.
// src/chunkDocument.ts
import { readFileSync } from "node:fs";
export function loadDocument(path: string): string {
return readFileSync(path, "utf8");
}
export function chunkText(text: string, chunkSize = 3000, overlap = 300): string[] {
const chunks: string[] = [];
let start = 0;
while (start < text.length) {
const end = Math.min(start + chunkSize, text.length);
chunks.push(text.slice(start, end));
start = end - overlap;
if (start < 0) start = 0;
if (end === text.length) break;
}
return chunks;
}
- •Define one agent for chunk analysis and one agent for synthesis. This pattern works well because the first agent extracts local facts and the second agent merges them into a single response.
// src/agents.ts
import { Agent } from "crewai";
export const documentAnalyst = new Agent({
name: "Document Analyst",
role: "Extract structured facts from document chunks",
goal: "Summarize each chunk accurately without inventing details",
backstory: "You are precise and only use information present in the provided text.",
});
export const synthesisAgent = new Agent({
name: "Synthesis Analyst",
role: "Combine chunk summaries into a coherent final answer",
goal: "Produce a concise answer grounded in all summaries",
backstory: "You reconcile overlapping summaries and remove duplicates.",
});
- •Build tasks that process each chunk independently, then create a final synthesis task. The intermediate summaries should be short and structured so the final pass has clean input.
// src/tasks.ts
import { Task } from "crewai";
import { documentAnalyst, synthesisAgent } from "./agents";
export function buildChunkTasks(chunks: string[]) {
return chunks.map(
(chunk, index) =>
new Task({
name: `Analyze chunk ${index + 1}`,
description:
`Read this document chunk and extract key facts, entities, dates, obligations, and risks.\n\nCHUNK:\n${chunk}`,
expectedOutput:
"A compact bullet list of factual findings with no speculation.",
agent: documentAnalyst,
})
);
}
export function buildSynthesisTask(summaries: string[]) {
return new Task({
name: "Synthesize document summary",
description:
`Combine these chunk summaries into one answer.\n\nSUMMARIES:\n${summaries.join(
"\n\n---\n\n"
)}`,
expectedOutput:
"A single coherent summary that preserves important details from all chunks.",
agent: synthesisAgent,
});
}
- •Wire everything together in a runnable entrypoint. This version reads a local file, chunks it, summarizes each piece through CrewAI tasks, then runs a final synthesis pass.
// src/index.ts
import "dotenv/config";
import { Crew } from "crewai";
import { loadDocument, chunkText } from "./chunkDocument";
import { buildChunkTasks, buildSynthesisTask } from "./tasks";
async function main() {
const doc = loadDocument("./docs/policy.txt");
const chunks = chunkText(doc, 3000, 300);
const chunkTasks = buildChunkTasks(chunks);
const crew = new Crew({
agents: [],
tasks: chunkTasks,
verbose: true,
process: "sequential",
});
const results = await crew.kickoff();
const summaries = Array.isArray(results) ? results.map(String) : [String(results)];
const synthesisCrew = new Crew({
agents: [],
tasks: [buildSynthesisTask(summaries)],
verbose: true,
process: "sequential",
});
const finalResult = await synthesisCrew.kickoff();
console.log("\nFINAL SUMMARY:\n");
console.log(String(finalResult));
}
main().catch((err) => {
console.error(err);
process.exit(1);
});
- •Run it against a real file and tune the chunk size if needed. If your source is dense legal or insurance text, smaller chunks usually produce better extraction quality than trying to maximize throughput.
mkdir -p docs src
cp /path/to/your/long-document.txt docs/policy.txt
npx tsx src/index.ts
Testing It
Start by using a document you already know well so you can spot missing facts quickly. Check whether the final output includes details that appear across multiple sections of the source file without duplicating them excessively.
If the output feels vague, reduce chunkSize to around 2000 characters and increase overlap slightly. If it feels repetitive, make the per-chunk expected output stricter by asking for only bullets under fixed headings like Facts, Entities, and Open Questions.
For production use, log each intermediate summary separately so you can trace where bad answers come from. That matters when you’re processing policy documents, claims files, or contract bundles where auditability is part of the requirement.
Next Steps
- •Add metadata extraction per chunk, such as page numbers or section headers.
- •Replace plain text files with PDF/OCR ingestion before chunking.
- •Add a retrieval layer so you only summarize relevant chunks for a given question
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit