AutoGen Tutorial (TypeScript): building a RAG pipeline for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
autogenbuilding-a-rag-pipeline-for-intermediate-developerstypescript

This tutorial shows you how to build a retrieval-augmented generation pipeline in TypeScript with AutoGen: ingest documents, index them with embeddings, retrieve the right chunks, and answer user questions with grounded context. You need this when a plain chat model is not enough and you want answers tied to your own docs, policies, or knowledge base.

What You'll Need

  • Node.js 18+ installed
  • A TypeScript project initialized with npm init -y
  • Packages:
    • autogen
    • openai
    • dotenv
    • ts-node and typescript for local execution
  • An OpenAI API key in .env:
    • OPENAI_API_KEY=...
  • A folder of source documents, for example:
    • ./data/policy.txt
    • ./data/faq.txt

Step-by-Step

  1. Start by installing dependencies and setting up a minimal TypeScript config. Keep this boring and explicit; RAG pipelines fail more from bad plumbing than bad prompts.
npm install autogen openai dotenv
npm install -D typescript ts-node @types/node
npx tsc --init
  1. Create a small document loader and chunker. For production, you would split by tokens, but for a working baseline, fixed-size chunks are enough to prove the retrieval path end to end.
import fs from "node:fs/promises";

export async function loadAndChunk(path: string, chunkSize = 800) {
  const text = await fs.readFile(path, "utf8");
  const chunks: string[] = [];

  for (let i = 0; i < text.length; i += chunkSize) {
    chunks.push(text.slice(i, i + chunkSize));
  }

  return chunks.map((content, index) => ({
    id: `${path}:${index}`,
    content,
    source: path,
  }));
}
  1. Build an embedding index over your chunks. This example uses OpenAI embeddings directly so the retrieval layer stays simple and deterministic.
import "dotenv/config";
import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export type Chunk = { id: string; content: string; source: string };

export async function embedText(text: string) {
  const res = await client.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
  });
  return res.data[0].embedding;
}

export async function buildIndex(chunks: Chunk[]) {
  const indexed = [];
  for (const chunk of chunks) {
    indexed.push({ ...chunk, embedding: await embedText(chunk.content) });
  }
  return indexed;
}
  1. Add cosine similarity retrieval. This is the core of RAG: turn the question into an embedding, compare it against your chunk embeddings, and keep the top matches.
function cosineSimilarity(a: number[], b: number[]) {
  let dot = 0;
  let magA = 0;
  let magB = 0;

  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    magA += a[i] * a[i];
    magB += b[i] * b[i];
  }

  return dot / (Math.sqrt(magA) * Math.sqrt(magB));
}

export async function retrieve(query: string, index: any[], k = 3) {
  const queryEmbedding = await embedText(query);

  return index
    .map((chunk) => ({
      ...chunk,
      score: cosineSimilarity(queryEmbedding, chunk.embedding),
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, k);
}
  1. Wire retrieval into an AutoGen assistant using a system prompt that forces grounded answers. The trick is to pass retrieved context into the chat call every time instead of hoping the model “remembers” anything useful.
import { AssistantAgent } from "autogen";
import "dotenv/config";

const assistant = new AssistantAgent({
  name: "rag_assistant",
});

export async function answerWithContext(question: string, retrievedChunks: any[]) {
  const context = retrievedChunks
    .map((c, i) => `[#${i + 1} | ${c.source} | score=${c.score.toFixed(3)}]\n${c.content}`)
    .join("\n\n");

  const result = await assistant.generateReply([
    {
      role: "system",
      content:
        "Answer only using the provided context. If the context is insufficient, say you do not know.",
    },
    {
      role: "user",
      content: `Context:\n${context}\n\nQuestion:\n${question}`,
    },
  ]);

  return result;
}
  1. Put it together in one executable entrypoint. This script loads files, indexes them once, retrieves relevant passages per query, and prints the grounded answer.
import { loadAndChunk } from "./loader";
import { buildIndex } from "./embed";
import { retrieve } from "./retrieve";
import { answerWithContext } from "./agent";

async function main() {
  

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides