AutoGen Tutorial (TypeScript): building a RAG pipeline for beginners

By Cyprian AaronsUpdated 2026-04-21
autogenbuilding-a-rag-pipeline-for-beginnerstypescript

This tutorial shows you how to build a basic retrieval-augmented generation pipeline in TypeScript with AutoGen: load documents, turn them into embeddings, retrieve the most relevant chunks, and answer user questions with grounded context. You need this when a plain chat agent is not enough and you want answers tied to your own docs instead of model memory.

What You'll Need

  • Node.js 18+ installed
  • A TypeScript project initialized with npm init -y
  • These packages:
    • @autogenai/autogen
    • openai
    • dotenv
    • typescript
    • tsx or ts-node for running TypeScript directly
  • An OpenAI API key in .env
  • A small text corpus to index, like product docs, policy text, or internal FAQs
  • Basic familiarity with AutoGen agents and async/await

Install the packages:

npm install @autogenai/autogen openai dotenv
npm install -D typescript tsx @types/node

Create a .env file:

OPENAI_API_KEY=your_key_here

Step-by-Step

  1. Start by defining a tiny document store and an embedding helper. For beginners, keep the data in memory so you can see the full flow before adding a vector database.
import "dotenv/config";
import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

type Chunk = {
  id: string;
  text: string;
  embedding: number[];
};

const docs = [
  { id: "doc1", text: "AutoGen is a framework for building multi-agent applications." },
  { id: "doc2", text: "RAG combines retrieval with generation so answers stay grounded in source data." },
  { id: "doc3", text: "Chunk documents into smaller pieces before embedding them." },
];
  1. Next, embed each chunk. This gives you vectors you can compare against user questions later.
async function embed(text: string): Promise<number[]> {
  const res = await client.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
  });
  return res.data[0].embedding;
}

async function buildIndex(): Promise<Chunk[]> {
  const indexed: Chunk[] = [];
  for (const doc of docs) {
    indexed.push({
      id: doc.id,
      text: doc.text,
      embedding: await embed(doc.text),
    });
  }
  return indexed;
}
  1. Add a simple similarity search. Cosine similarity is enough for a beginner pipeline and keeps the code dependency-free.
function cosineSimilarity(a: number[], b: number[]): number {
  let dot = 0;
  let magA = 0;
  let magB = 0;

  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    magA += a[i] * a[i];
    magB += b[i] * b[i];
  }

  return dot / (Math.sqrt(magA) * Math.sqrt(magB));
}

async function retrieve(query: string, index: Chunk[], topK = 2): Promise<Chunk[]> {
  const queryEmbedding = await embed(query);

  return index
    .map((chunk) => ({
      ...chunk,
      score: cosineSimilarity(queryEmbedding, chunk.embedding),
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, topK);
}
  1. Now wire retrieval into an AutoGen assistant. The assistant gets only the retrieved context plus the user question, which is the core RAG pattern.
import { AssistantAgent } from "@autogenai/autogen";

async function answerQuestion(question: string, index: Chunk[]) {
  const matches = await retrieve(question, index);

  const context = matches
    .map((chunk) => `Source ${chunk.id}: ${chunk.text}`)
    .join("\n");

  const agent = new AssistantAgent({
    name: "rag_assistant",
    systemMessage:
      "Answer only using the provided context. If the context is insufficient, say you do not know.",
    modelClientOptions: {
      apiKey: process.env.OPENAI_API_KEY,
      model: "gpt-4o-mini",
    },
  });

  const prompt = `Context:\n${context}\n\nQuestion:\n${question}`;
  const response = await agent.run(prompt);
  console.log(response);
}
  1. Put it together in one executable entrypoint. This script builds the index once, then answers a sample question using retrieved chunks.
async function main() {
  if (!process.env.OPENAI_API_KEY) {
    throw new Error("Missing OPENAI_API_KEY");
  }

  const index = await buildIndex();
  await answerQuestion("What is RAG used for?", index);
}

main().catch(console.error);
  1. Run it locally and inspect both the retrieved sources and final answer. If you want better behavior later, replace the in-memory array with a real vector store like pgvector or Pinecone.
npx tsx src/rag.ts

Testing It

Run the script with a few different questions and check whether the retrieved chunks actually match the query. If you ask about embeddings, chunking, or grounding, you should see those source snippets appear in context before the final answer.

If the model starts hallucinating, tighten the system message and reduce topK to keep context focused. If retrieval feels weak, split long documents into smaller chunks before embedding them.

A good sanity check is to ask something outside your corpus. The assistant should say it does not know instead of inventing an answer.

Next Steps

  • Replace the in-memory index with PostgreSQL + pgvector for persistent storage
  • Add chunking logic for long documents instead of indexing whole paragraphs
  • Introduce an AutoGen tool that fetches documents from your own API before retrieval

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides