LangChain Tutorial (TypeScript): building a RAG pipeline for advanced developers

By Cyprian AaronsUpdated 2026-04-21

langchainbuilding-a-rag-pipeline-for-advanced-developerstypescript

This tutorial builds a production-shaped Retrieval-Augmented Generation (RAG) pipeline in TypeScript with LangChain. You’ll wire together document loading, chunking, embeddings, vector search, retrieval, and answer generation so you can ground LLM responses in your own data.

What You'll Need

•Node.js 18+
•A TypeScript project with ts-node or a build step
•langchain
•@langchain/openai
•@langchain/community
•@langchain/core
•An OpenAI API key in OPENAI_API_KEY
•A small document set to index, such as PDFs, markdown files, or plain text
•Optional: a vector store backend like Chroma or PostgreSQL if you want persistence

Step-by-Step

•Set up the project and install the packages. This example uses OpenAI embeddings plus an in-memory vector store so you can run it end-to-end without extra infrastructure.

npm init -y
npm install langchain @langchain/openai @langchain/community @langchain/core dotenv
npm install -D typescript ts-node @types/node

•Create a simple RAG index from local text files. We’ll load documents from disk, split them into chunks, embed them, and store them in memory for retrieval.

import "dotenv/config";
import { DirectoryLoader } from "@langchain/community/document_loaders/fs/directory";
import { TextLoader } from "@langchain/community/document_loaders/fs/text";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { OpenAIEmbeddings } from "@langchain/openai";
import { MemoryVectorStore } from "langchain/vectorstores/memory";

const loader = new DirectoryLoader("./docs", {
  ".txt": (path) => new TextLoader(path),
});

const docs = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({ chunkSize: 800, chunkOverlap: 120 });
const chunks = await splitter.splitDocuments(docs);

const embeddings = new OpenAIEmbeddings({ model: "text-embedding-3-small" });
const vectorStore = await MemoryVectorStore.fromDocuments(chunks, embeddings);

console.log(`Loaded ${docs.length} docs and indexed ${chunks.length} chunks`);

•Build a retriever that returns the most relevant chunks for a question. The key point here is that retrieval should be deterministic and constrained; don’t dump the entire corpus into the prompt.

const retriever = vectorStore.asRetriever(4);

const query = "What is our policy on customer data retention?";
const relevantDocs = await retriever.invoke(query);

for (const [i, doc] of relevantDocs.entries()) {
  console.log(`\n--- Chunk ${i + 1} ---`);
  console.log(doc.pageContent);
}

•Connect retrieval to an LLM using a prompt that forces grounded answers. This is the actual RAG loop: retrieve context first, then answer only from that context.

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";

const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0,
});

const prompt = ChatPromptTemplate.fromMessages([
  ["system", "Answer only using the provided context. If the answer is not in the context, say you don't know."],
  ["human", "Question: {question}\n\nContext:\n{context}"],
]);

const context = relevantDocs.map((d) => d.pageContent).join("\n\n---\n\n");
const messages = await prompt.formatMessages({ question: query, context });

const response = await model.invoke(messages);
console.log(response.content);

•Wrap it into one reusable function so your application can call it like any other service. In production, this is where you’d add tracing, caching, metadata filters, and persistence.

async function answerQuestion(question: string) {
  const docs = await retriever.invoke(question);
  const context = docs.map((d) => d.pageContent).join("\n\n---\n\n");

  const messages = await prompt.formatMessages({ question, context });
  const result = await model.invoke(messages);

  return result.content;
}

const answer = await answerQuestion("How long do we keep customer records?");
console.log("\nFinal answer:\n", answer);

Testing It

Run the script against a small folder like ./docs containing a few .txt files with policy or product content. Ask questions that are explicitly answered in the source material first, then ask something outside the corpus to confirm the model refuses to hallucinate.

You should see retrieved chunks printed before the final answer. If the answer ignores context or invents details, tighten your prompt and keep temperature: 0.

For better signal, test with overlapping terms across documents. That helps you validate whether chunking and top-k retrieval are returning the right evidence instead of just semantically similar noise.

Next Steps

•Add metadata filtering by document type, tenant, or effective date
•Swap MemoryVectorStore for Chroma or PostgreSQL when you need persistence
•Add citations by returning chunk sources alongside the final answer

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit