LangChain Tutorial (Python): building a RAG pipeline for intermediate developers

By Cyprian AaronsUpdated 2026-04-21

langchainbuilding-a-rag-pipeline-for-intermediate-developerspython

This tutorial builds a production-shaped Retrieval-Augmented Generation (RAG) pipeline in Python with LangChain. You’ll load documents, chunk them, index them in a vector store, and answer questions using retrieved context instead of relying on the model’s memory alone.

What You'll Need

•Python 3.10+
•An OpenAI API key set as OPENAI_API_KEY
•
These packages:
- •langchain
- •langchain-openai
- •langchain-community
- •langchain-text-splitters
- •faiss-cpu
•A few text files to index, or you can use the sample docs below
•Basic familiarity with Python classes, functions, and virtual environments

Install everything:

pip install langchain langchain-openai langchain-community langchain-text-splitters faiss-cpu

Step-by-Step

•Start by loading a small document set and splitting it into chunks. For RAG, chunking matters because retrieval quality is usually better when you index smaller, semantically coherent pieces rather than whole files.

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = TextLoader("docs/policy.txt", encoding="utf-8")
documents = loader.load()

splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,
    chunk_overlap=120,
)
chunks = splitter.split_documents(documents)

print(f"Loaded {len(documents)} document(s)")
print(f"Created {len(chunks)} chunks")
print(chunks[0].page_content[:300])

•Next, embed the chunks and store them in a vector database. FAISS is enough for local development and gives you a clean path to swap in a managed vector store later.

import os
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

os.environ["OPENAI_API_KEY"] = os.environ["OPENAI_API_KEY"]

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

•Build the prompt and retrieval chain. The key detail here is to force the model to answer only from retrieved context; that keeps hallucinations down and makes failures easier to detect.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer only from the provided context. If the answer is not in the context, say you don't know."),
    ("human", "Context:\n{context}\n\nQuestion: {input}")
])

document_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, document_chain)

•Run a question through the pipeline. If your source docs are good, this should return an answer grounded in the indexed text plus the retrieved source snippets.

question = "What does the policy say about data retention?"
result = rag_chain.invoke({"input": question})

print("Answer:")
print(result["answer"])
print("\nRetrieved chunks:")
for i, doc in enumerate(result["context"], start=1):
    print(f"\nChunk {i}:")
    print(doc.page_content[:400])

•Wrap it in a reusable function so you can call it from an API, worker, or CLI. This is where RAG becomes useful in real systems: you keep ingestion separate from query-time execution.

def answer_question(query: str) -> dict:
    response = rag_chain.invoke({"input": query})
    return {
        "question": query,
        "answer": response["answer"],
        "sources": [doc.metadata for doc in response["context"]],
    }

if __name__ == "__main__":
    output = answer_question("Who can access customer records?")
    print(output["answer"])

Testing It

Run the script against a document you know well and ask questions whose answers are explicitly present in the text. You want to see two things: the answer should match the source content, and the retrieved chunks should contain the evidence used to generate it.

Then ask an out-of-scope question like “What was the CEO’s opinion on this policy?” if that information is not in your docs. A correct RAG setup should refuse cleanly with something like “I don’t know,” not invent details.

If results are weak, check chunk size first, then retrieval k, then your source documents. In most real projects, bad RAG is usually bad data or bad chunking before it’s anything else.

Next Steps

•Add metadata filtering so you can scope retrieval by tenant, department, or document type.
•Replace FAISS with a persistent store like PostgreSQL + pgvector or Pinecone for multi-session applications.
•Add citations to your response schema so downstream systems can show which chunks supported each answer.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit