CrewAI Tutorial (Python): building a RAG pipeline for intermediate developers

By Cyprian AaronsUpdated 2026-04-21

crewaibuilding-a-rag-pipeline-for-intermediate-developerspython

This tutorial shows you how to build a production-style RAG pipeline with CrewAI in Python: ingest documents, retrieve relevant chunks, and generate grounded answers from them. You’d use this when a plain LLM is too generic and you need answers tied to your own docs, policies, or knowledge base.

What You'll Need

•Python 3.10+
•A virtual environment
•crewai
•crewai-tools
•langchain-openai
•faiss-cpu
•python-dotenv
•OpenAI API key in OPENAI_API_KEY
•A small document set to index, such as .txt files in a local docs/ folder

Install the dependencies:

pip install crewai crewai-tools langchain-openai faiss-cpu python-dotenv

Step-by-Step

•Start by loading your environment variables and splitting your documents into chunks. For this tutorial, we’ll keep the ingestion simple and use local text files so you can run it end-to-end without extra infrastructure.

import os
from dotenv import load_dotenv
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

load_dotenv()

loader = DirectoryLoader(
    "docs",
    glob="**/*.txt",
    loader_cls=TextLoader,
    show_progress=True,
)
documents = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=120)
chunks = splitter.split_documents(documents)

print(f"Loaded {len(documents)} docs and created {len(chunks)} chunks")

•Next, embed the chunks and store them in a FAISS vector index. This gives you fast semantic retrieval without needing a database service for the first version.

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

query = "What does the policy say about claim escalation?"
results = retriever.invoke(query)

for i, doc in enumerate(results, 1):
    print(f"\n--- Chunk {i} ---")
    print(doc.page_content[:500])

•Now define a CrewAI tool that wraps retrieval. The agent will call this tool whenever it needs source context, which is the cleanest way to keep the answer grounded in your documents.

from crewai.tools import tool

@tool("retrieve_context")
def retrieve_context(query: str) -> str:
    """Retrieve relevant context from the indexed knowledge base."""
    docs = retriever.invoke(query)
    return "\n\n".join(
        f"[Source {i+1}] {doc.page_content}" for i, doc in enumerate(docs)
    )

print(retrieve_context("claim escalation"))

•Create an agent that uses the retrieval tool and a task that forces grounded answers. The key here is to make the agent cite only what comes back from retrieval instead of inventing missing details.

from crewai import Agent, Task, Crew, Process

rag_agent = Agent(
    role="RAG Analyst",
    goal="Answer questions using only retrieved context",
    backstory="You are strict about grounding answers in source documents.",
    tools=[retrieve_context],
    verbose=True,
)

rag_task = Task(
    description=(
        "Answer the user's question using only the retrieved context. "
        "If the context does not contain the answer, say so clearly."
    ),
    expected_output="A concise answer grounded in source text.",
    agent=rag_agent,
)

•Finish by wiring everything into a Crew and running it against a real question. In production you would expose this behind an API route or internal chatbot; here we’ll just print the result so you can verify the full flow.

crew = Crew(
    agents=[rag_agent],
    tasks=[rag_task],
    process=Process.sequential,
)

question = "What is the escalation path for unresolved claims?"
result = crew.kickoff(inputs={"question": question})

print("\n=== Final Answer ===")
print(result)

Testing It

Run the script with a few questions that should be answerable from your docs, then compare the output to the source files. If retrieval is working properly, you should see relevant chunks returned before the final answer is generated.

Test three cases: one easy lookup, one ambiguous query, and one query that is not covered by your documents. The last case matters because a good RAG system should refuse to guess when it has no evidence.

If answers look vague or off-topic, reduce chunk size, increase overlap slightly, or raise k from 4 to 6. If retrieval is still weak, your source documents may need cleaner structure before indexing.

Next Steps

•Swap FAISS for a persistent vector store like PostgreSQL + pgvector or Pinecone
•Add metadata filtering so agents can search by product line, jurisdiction, or document version
•Wrap this CrewAI pipeline in FastAPI and add auth before exposing it internally

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit