CrewAI Tutorial (TypeScript): building a RAG pipeline for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
crewaibuilding-a-rag-pipeline-for-intermediate-developerstypescript

This tutorial shows you how to build a Retrieval-Augmented Generation (RAG) pipeline in TypeScript with CrewAI: ingest documents, retrieve relevant context, and generate grounded answers. You’d use this when you want your agent to answer from your own knowledge base instead of hallucinating from model memory.

What You'll Need

  • Node.js 18+
  • A TypeScript project with npm or pnpm
  • An OpenAI API key
  • CrewAI TypeScript packages:
    • @crewai/crew
    • @crewai/core
    • @crewai/tools
  • A vector store package or local retrieval layer
  • A .env file with your API keys
  • A document corpus to index, such as PDFs, markdown files, or internal policy docs

Step-by-Step

  1. Set up the project and install dependencies. Keep the first pass simple: one retriever tool, one answerer agent, and one crew that orchestrates both.
mkdir crewai-rag-ts
cd crewai-rag-ts
npm init -y
npm install @crewai/crew @crewai/core @crewai/tools openai dotenv zod
npm install -D typescript tsx @types/node
npx tsc --init
  1. Add your environment variables and a small document store. For production work, you’d replace the in-memory array with a real vector database, but this version keeps the mechanics visible.
// src/docs.ts
export const docs = [
  {
    id: "policy-001",
    title: "Claims Policy",
    content: "Claims must be submitted within 30 days of incident date.",
  },
  {
    id: "policy-002",
    title: "Coverage Policy",
    content: "Flood damage is excluded unless explicitly added by endorsement.",
  },
];
  1. Create a retrieval tool that searches your documents by keyword overlap. This is not semantic search yet, but it gives you a working RAG loop and a clean place to swap in embeddings later.
// src/retriever.ts
import { Tool } from "@crewai/tools";
import { docs } from "./docs";

export const retrieveDocsTool = new Tool({
  name: "retrieve_docs",
  description: "Retrieve relevant policy documents for a user question.",
  func: async (query: string) => {
    const terms = query.toLowerCase().split(/\W+/).filter(Boolean);
    const matches = docs.filter((doc) =>
      terms.some((term) => doc.content.toLowerCase().includes(term) || doc.title.toLowerCase().includes(term))
    );

    return JSON.stringify(matches.slice(0, 3));
  },
});
  1. Define the agents and tasks. The retriever agent gathers context, then the answer agent produces a grounded response using only retrieved material.
// src/crew.ts
import { Agent, Task, Crew } from "@crewai/crew";
import { retrieveDocsTool } from "./retriever";

export const retrieverAgent = new Agent({
  name: "Retriever",
  role: "Document Retriever",
  goal: "Find the most relevant policy snippets for the question.",
  backstory: "You are precise and conservative about relevance.",
  tools: [retrieveDocsTool],
});

export const answerAgent = new Agent({
  name: "Answerer",
  role: "RAG Answering Agent",
  goal: "Answer strictly from retrieved context.",
  backstory: "You do not invent facts outside the provided documents.",
});

export const retrieveTask = new Task({
  description: "Retrieve relevant documents for the user question: {question}",
  expectedOutput: "A JSON string containing relevant documents.",
  agent: retrieverAgent,
});

export const answerTask = new Task({
  description:
    "Using only retrieved documents from the previous task, answer the question clearly and cite which policy snippet was used.",
  expectedOutput: "A concise grounded answer.",
  agent: answerAgent,
});

export const crew = new Crew({
  agents: [retrieverAgent, answerAgent],
  tasks: [retrieveTask, answerTask],
});
  1. Wire everything together in an executable entrypoint. This is where you pass the user question into the crew and print the final result.
// src/index.ts
import dotenv from "dotenv";
dotenv.config();

import { crew } from "./crew";

async function main() {
  const result = await crew.kickoff({
    inputs: {
      question: "Is flood damage covered?",
    },
  });

  console.log(result);
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});
  1. Add a proper runtime script and run it. If your output is grounded in the sample docs, your pipeline is working; if not, fix retrieval before touching generation.
{
  "name": "crewai-rag-ts",
  "type": "module",
  "scripts": {
    "dev": "tsx src/index.ts"
  }
}

Testing It

Run npm run dev and inspect the output. For the sample question above, the system should return that flood damage is excluded unless explicitly added by endorsement.

Try at least three questions:

  • “Is flood damage covered?”
  • “How long do I have to submit a claim?”
  • “What happens if I ask about something not in the docs?”

If retrieval is working correctly, unanswered questions should produce a cautious response instead of fabricated details. That’s the core behavior you want before moving to embeddings or a vector database.

Next Steps

  • Replace keyword matching with embeddings plus a vector store like Pinecone, Weaviate, or pgvector.
  • Add source citation formatting so every answer includes document IDs and snippets.
  • Introduce guardrails that reject answers when retrieval confidence is too low.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides