AutoGen Tutorial (Python): building a RAG pipeline for beginners

By Cyprian AaronsUpdated 2026-04-21

autogenbuilding-a-rag-pipeline-for-beginnerspython

This tutorial shows you how to build a basic Retrieval-Augmented Generation (RAG) pipeline with AutoGen in Python: ingest documents, index them with embeddings, retrieve relevant chunks, and answer questions with an assistant agent. You need this when a plain LLM is not enough and you want answers grounded in your own docs instead of model memory.

What You'll Need

•Python 3.10+
•pyautogen
•chromadb
•openai
•An OpenAI API key set as OPENAI_API_KEY
•A small text corpus to test with, like policy docs, FAQs, or internal notes
•Basic familiarity with AutoGen agents and Python scripts

Install the packages:

pip install pyautogen chromadb openai

Step-by-Step

•Start by creating a small document set and splitting it into chunks. For a beginner setup, keep the data local and simple so you can focus on the retrieval flow instead of infrastructure.

from pathlib import Path

docs_dir = Path("docs")
docs_dir.mkdir(exist_ok=True)

(docs_dir / "benefits.txt").write_text(
    "Employees get 20 days of annual leave.\n"
    "Health insurance starts on day one.\n"
    "Remote work is allowed three days per week."
)

(docs_dir / "it_policy.txt").write_text(
    "Password rotation is required every 90 days.\n"
    "VPN access is mandatory outside the office.\n"
    "Laptops must be encrypted."
)

•Next, load those files into ChromaDB and create embeddings with OpenAI. This gives you a persistent vector store that can return the most relevant chunks for any question.

import os
import chromadb
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
chroma_client = chromadb.PersistentClient(path="./chroma_db")
collection = chroma_client.get_or_create_collection(name="company_docs")

def chunk_text(text, chunk_size=200):
    return [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]

doc_id = 0
for file_path in docs_dir.glob("*.txt"):
    text = file_path.read_text()
    for chunk in chunk_text(text):
        emb = client.embeddings.create(
            model="text-embedding-3-small",
            input=chunk,
        ).data[0].embedding
        collection.add(
            ids=[f"doc-{doc_id}"],
            documents=[chunk],
            embeddings=[emb],
            metadatas=[{"source": file_path.name}],
        )
        doc_id += 1

•Now add a retrieval function that turns a user question into an embedding and fetches the top matches. This is the core of RAG: retrieve first, then generate from retrieved context.

def retrieve(query, k=3):
    query_emb = client.embeddings.create(
        model="text-embedding-3-small",
        input=query,
    ).data[0].embedding

    results = collection.query(
        query_embeddings=[query_emb],
        n_results=k,
    )

    chunks = results["documents"][0]
    sources = results["metadatas"][0]
    return list(zip(chunks, sources))

question = "How often do I need to rotate my password?"
matches = retrieve(question)
for chunk, meta in matches:
    print(meta["source"], "=>", chunk)

•Wire the retrieved context into an AutoGen assistant agent. The assistant will answer using only the context you pass it, which keeps responses grounded and makes debugging much easier.

from autogen import AssistantAgent

llm_config = {
    "model": "gpt-4o-mini",
    "api_key": os.environ["OPENAI_API_KEY"],
}

assistant = AssistantAgent(
    name="rag_assistant",
    llm_config=llm_config,
)

def answer_question(question):
    matches = retrieve(question)
    context = "\n\n".join(
        f"[Source: {meta['source']}]\n{chunk}"
        for chunk, meta in matches
    )

    prompt = (
        "Answer the question using only the context below.\n\n"
        f"Context:\n{context}\n\n"
        f"Question: {question}"
    )

    response = assistant.generate_reply(messages=[{"role": "user", "content": prompt}])
    return response

print(answer_question("What are the rules for VPN access?"))

•Finally, wrap it in a simple loop so you can test multiple questions without restarting the script. This is enough for a beginner RAG prototype and gives you a clean path to add APIs later.

if __name__ == "__main__":
    while True:
        q = input("\nAsk a question (or 'exit'): ").strip()
        if q.lower() == "exit":
            break

        try:
            print("\n" + str(answer_question(q)))
        except Exception as e:
            print(f"Error: {e}")

Testing It

Run the script and ask questions that are clearly answered by your sample docs, like “How many leave days do employees get?” or “When does health insurance start?”. You should see answers that match the stored text and mention only facts present in your documents.

Then try an out-of-scope question like “What is our vacation carryover policy?” If your docs don’t contain that answer, the model should either say it doesn’t know or give a weak answer; if it hallucinates confidently, tighten your prompt to require abstaining when context is missing.

A good sanity check is to print retrieved chunks before generation. If retrieval returns irrelevant text, fix chunking size, add better metadata, or use more documents per topic.

Next Steps

•Add source citations to every answer so users can trace where the response came from.
•Replace manual chunking with a real text splitter that respects paragraphs and headings.
•Move from a single-agent setup to an AutoGen multi-agent workflow with one agent for retrieval and one for answering.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit