AutoGen Tutorial (TypeScript): building a RAG pipeline for advanced developers

By Cyprian AaronsUpdated 2026-04-21

autogenbuilding-a-rag-pipeline-for-advanced-developerstypescript

This tutorial builds a production-shaped Retrieval-Augmented Generation pipeline in TypeScript using AutoGen. You’ll wire together document ingestion, embedding, vector search, and an assistant agent that answers from retrieved context instead of hallucinating from memory.

What You'll Need

•Node.js 18+
•A TypeScript project with ts-node or tsx
•autogen-agentchat
•@autogen-ext/openai
•@langchain/community
•@langchain/core
•@langchain/openai
•An OpenAI API key in OPENAI_API_KEY
•A small document corpus to index locally
•Optional: .env support via dotenv

Step-by-Step

•Start with a clean TypeScript project and install the packages you need for AutoGen, embeddings, and local vector storage. I’m using Chroma here because it is simple to run locally and good enough for development workflows.

npm init -y
npm i autogen-agentchat @autogen-ext/openai @langchain/community @langchain/core @langchain/openai dotenv
npm i -D typescript tsx @types/node

•Create a tiny knowledge base and split it into chunks before embedding. For real systems, this is where you’d add PDF parsing, HTML cleanup, metadata enrichment, and chunking rules tuned to your domain.

import "dotenv/config";
import { Chroma } from "@langchain/community/vectorstores/chroma";
import { OpenAIEmbeddings } from "@langchain/openai";
import { Document } from "@langchain/core/documents";

const docs = [
  new Document({
    pageContent:
      "Claims handlers must verify policy number, incident date, and loss description before opening a claim.",
    metadata: { source: "claims-playbook" },
  }),
  new Document({
    pageContent:
      "Fraud review is required when the claim amount exceeds the manual review threshold or the incident pattern matches known fraud indicators.",
    metadata: { source: "fraud-policy" },
  }),
];

•Build the vector store and persist it locally. This gives you repeatable retrieval across runs instead of re-indexing every time your agent starts.

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});

const vectorStore = await Chroma.fromDocuments(docs, embeddings, {
  collectionName: "insurance-rag",
  url: "http://localhost:8000",
});

•Add a retrieval function that pulls back only the most relevant chunks for each user query. Keep this function deterministic; the LLM should do the reasoning, not the search layer.

async function retrieveContext(query: string) {
  const results = await vectorStore.similaritySearch(query, 3);
  return results
    .map(
      (doc, index) =>
        `[${index + 1}] ${doc.pageContent} (source: ${doc.metadata.source})`,
    )
    .join("\n");
}

•Create an AutoGen assistant that answers strictly from retrieved context. The key pattern here is to inject context into the system prompt at runtime so the model sees grounded evidence before generating a response.

import { AssistantAgent } from "@autogen-agentchat/agents";
import { OpenAIChatCompletionClient } from "@autogen-ext/openai";

const modelClient = new OpenAIChatCompletionClient({
  model: "gpt-4o-mini",
});

async function answerQuestion(question: string) {
  const context = await retrieveContext(question);

  const agent = new AssistantAgent({
    name: "rag_assistant",
    modelClient,
    systemMessage:
      "Answer only from the provided context. If the context is insufficient, say so clearly.",
  });

  const result = await agent.run([
    {
      role: "user",
      content: `Context:\n${context}\n\nQuestion: ${question}`,
    },
  ]);

  return result.messages.at(-1)?.content;
}

•Run the pipeline end to end with a real query. In production you’d wrap this behind an API route or job worker, but this single call proves retrieval and generation are wired correctly.

const question =
  "When should a claim be sent for fraud review?";
const answer = await answerQuestion(question);

console.log("Question:", question);
console.log("Answer:", answer);

Testing It

Run your script with npx tsx src/rag.ts and ask questions that are clearly covered by your indexed documents. You should see responses that quote or paraphrase the retrieved policy text instead of inventing details.

Then ask something outside the corpus, like a question about payout calculations if you never indexed payout rules. The assistant should say the context is insufficient rather than guessing.

If you want stronger validation, log the retrieved chunks before calling the agent and compare them against expected sources. That catches bad chunking, weak embeddings, or overly broad retrieval settings early.

Next Steps

•Add metadata filters for line-of-business, jurisdiction, or document version.
•Swap Chroma for pgvector when you need Postgres-backed persistence and access control.
•Add a reranker step before generation if your corpus is large or retrieval quality starts slipping.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit