LangGraph Tutorial (Python): building a RAG pipeline for beginners

By Cyprian AaronsUpdated 2026-04-21

langgraphbuilding-a-rag-pipeline-for-beginnerspython

This tutorial builds a minimal Retrieval-Augmented Generation pipeline with LangGraph in Python. You’ll wire together document loading, chunking, embedding, retrieval, and answer generation so you can turn a plain LLM into something that answers from your own data.

What You'll Need

•Python 3.10+
•An OpenAI API key set as OPENAI_API_KEY
•
These packages:
- •langgraph
- •langchain
- •langchain-openai
- •langchain-community
- •faiss-cpu
•A small local text file to test with, for example docs/policy.txt

Install everything:

pip install langgraph langchain langchain-openai langchain-community faiss-cpu

Step-by-Step

•Start by loading your document and splitting it into chunks. For beginners, a single text file is enough to prove the pipeline works before you add PDFs, SharePoint, or a vector database.

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = TextLoader("docs/policy.txt", encoding="utf-8")
documents = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = splitter.split_documents(documents)

print(f"Loaded {len(documents)} document(s)")
print(f"Created {len(chunks)} chunks")

•Next, embed the chunks and store them in a retriever-backed vector index. FAISS is fine for local development and keeps the example simple.

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

query = "What does the policy say about claim deadlines?"
results = retriever.invoke(query)

for i, doc in enumerate(results, start=1):
    print(f"\nResult {i}:\n{doc.page_content[:300]}")

•Now define the LangGraph state and the nodes that will run your RAG flow. The graph needs one node to retrieve context and another to generate the answer from that context.

from typing import TypedDict, List
from langchain_core.documents import Document
from langchain_openai import ChatOpenAI

class RAGState(TypedDict):
    question: str
    context: List[Document]
    answer: str

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def retrieve(state: RAGState):
    return {"context": retriever.invoke(state["question"])}

def generate(state: RAGState):
    context_text = "\n\n".join(doc.page_content for doc in state["context"])
    prompt = (
        "Answer the question using only the context below.\n\n"
        f"Context:\n{context_text}\n\nQuestion: {state['question']}"
    )
    response = llm.invoke(prompt)
    return {"answer": response.content}

•Build the graph with LangGraph and compile it into an executable app. This is the part that makes the workflow explicit and easy to extend later with routing, grading, or human review.

from langgraph.graph import StateGraph, START, END

builder = StateGraph(RAGState)
builder.add_node("retrieve", retrieve)
builder.add_node("generate", generate)

builder.add_edge(START, "retrieve")
builder.add_edge("retrieve", "generate")
builder.add_edge("generate", END)

app = builder.compile()

•Run the pipeline with a real question and inspect the result. If your document contains the answer, you should get a grounded response instead of a generic LLM guess.

result = app.invoke({"question": "What does the policy say about claim deadlines?"})

print("Question:", result["question"])
print("\nAnswer:\n", result["answer"])
print("\nSources:")
for doc in result["context"]:
    print("-", doc.metadata.get("source", "unknown"))

Testing It

Use a document where you already know the answer so you can check whether retrieval is actually pulling the right chunk. If the model answers correctly but the retrieved context is wrong, your issue is usually chunking or search quality, not generation.

A good first test is to ask three questions:

•one answerable directly from the file
•one paraphrased version of that same question
•one question that is not in the file at all

For production-style debugging, print both retrieved chunks and final output. That tells you whether failures come from retrieval drift or prompt behavior.

Next Steps

•Add metadata filters so retrieval can target specific policies, products, or regions.
•Replace FAISS with a persistent vector store like Pinecone or pgvector.
•Add a grader node in LangGraph to reject low-confidence answers before returning them to users.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit