LangChain Tutorial (Python): building a RAG pipeline for beginners

By Cyprian AaronsUpdated 2026-04-21
langchainbuilding-a-rag-pipeline-for-beginnerspython

This tutorial builds a minimal Retrieval-Augmented Generation (RAG) pipeline in Python using LangChain. You’ll load a document, split it into chunks, embed it into a vector store, retrieve relevant context for a question, and generate an answer with an LLM.

What You'll Need

  • Python 3.10+
  • An OpenAI API key set as OPENAI_API_KEY
  • Packages:
    • langchain
    • langchain-openai
    • langchain-community
    • faiss-cpu
  • A text file to index, for example docs/policy.txt

Install the dependencies:

pip install langchain langchain-openai langchain-community faiss-cpu

Step-by-Step

  1. Start by loading a document and splitting it into chunks. RAG works better when the model retrieves smaller pieces of text instead of one large blob.
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = TextLoader("docs/policy.txt", encoding="utf-8")
documents = loader.load()

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100,
)
chunks = splitter.split_documents(documents)

print(f"Loaded {len(documents)} document(s)")
print(f"Created {len(chunks)} chunks")
  1. Next, turn those chunks into vectors and store them in FAISS. This gives you fast similarity search over your document content.
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
  1. Now create the prompt and retrieval chain. The prompt tells the model to answer only from the provided context, which is the core RAG pattern.
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(
    """Answer the question using only the context below.

Context:
{context}

Question: {input}

If the answer is not in the context, say "I don't know.""""
)
  1. Wire the retriever and LLM together with LangChain’s built-in retrieval chain. This is the part that actually performs retrieval first, then generation.
from langchain_openai import ChatOpenAI
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

document_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, document_chain)
  1. Run a question through the pipeline. You should get an answer grounded in your source document instead of a generic model response.
question = "What is the refund policy?"
result = rag_chain.invoke({"input": question})

print("Question:", question)
print("Answer:", result["answer"])
  1. If you want better debugging, inspect which chunks were retrieved. In production, this is how you verify whether bad answers come from retrieval or generation.
docs = retriever.invoke("What is the refund policy?")

for i, doc in enumerate(docs, start=1):
    print(f"\n--- Chunk {i} ---")
    print(doc.page_content[:500])

Testing It

Run the script against a document that contains a few clear facts, then ask questions that are both answerable and unanswerable. For an answerable question, the output should reference content from your file; for an unanswerable one, it should return “I don't know.” If it hallucinates anyway, check whether your chunks are too large, your retriever is returning irrelevant results, or your prompt is too permissive.

A good test set includes:

  • One question with an exact answer in the text
  • One paraphrased question
  • One question not covered by the document

If retrieval looks wrong, print out the returned chunks before touching the LLM settings.

Next Steps

  • Add metadata filtering so you can retrieve by source, department, or document type.
  • Replace FAISS with a production vector database like pgvector or Pinecone.
  • Add citations to your answers so users can see exactly which chunks were used.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides