LangGraph Tutorial (Python): building a RAG pipeline for intermediate developers

By Cyprian AaronsUpdated 2026-04-21

langgraphbuilding-a-rag-pipeline-for-intermediate-developerspython

This tutorial builds a small but production-shaped Retrieval-Augmented Generation pipeline with LangGraph in Python. You’ll end up with a graph that takes a question, retrieves relevant context from a vector store, generates an answer with that context, and returns the result in a way that’s easy to extend.

What You'll Need

•Python 3.10+
•langgraph
•langchain
•langchain-openai
•langchain-community
•faiss-cpu
•An OpenAI API key in OPENAI_API_KEY
•A few local documents to index, or the sample docs below

Install the packages:

pip install langgraph langchain langchain-openai langchain-community faiss-cpu

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

1) Load documents and build a vector store

Start by turning raw text into retrievable chunks. For this tutorial, we’ll use a tiny in-memory corpus so you can run it end-to-end without external dependencies beyond the API key.

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

docs = [
    "LangGraph is useful for building stateful LLM applications with control flow.",
    "RAG combines retrieval from external data with generation from an LLM.",
    "For better answers, chunk documents into smaller passages before embedding."
]

splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
chunks = splitter.create_documents(docs)

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

2) Define the graph state and helper functions

LangGraph works best when your state is explicit. Here we keep only what we need: the user question, retrieved context, and final answer.

from typing import TypedDict, Annotated
from operator import add

class GraphState(TypedDict):
    question: str
    context: list[str]
    answer: str

def retrieve(state: GraphState):
    docs = retriever.invoke(state["question"])
    return {"context": [doc.page_content for doc in docs]}

def format_context(context: list[str]) -> str:
    return "\n\n".join(f"- {chunk}" for chunk in context)

3) Add the generation node

This node takes retrieved passages and asks the model to answer only from that context. The prompt is simple on purpose; in real systems you’ll usually add citations, refusal behavior, and stricter formatting.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer using only the provided context. If the context is insufficient, say so."),
    ("human", "Question: {question}\n\nContext:\n{context}")
])

def generate(state: GraphState):
    context_text = format_context(state["context"])
    chain = prompt | llm
    response = chain.invoke({"question": state["question"], "context": context_text})
    return {"answer": response.content}

4) Wire the graph together

This is the part LangGraph makes clean: retrieval and generation are separate nodes connected by explicit edges. That gives you room to insert validation, reranking, fallback logic, or human review later.

from langgraph.graph import StateGraph, START, END

builder = StateGraph(GraphState)

builder.add_node("retrieve", retrieve)
builder.add_node("generate", generate)

builder.add_edge(START, "retrieve")
builder.add_edge("retrieve", "generate")
builder.add_edge("generate", END)

graph = builder.compile()

5) Run a query through the pipeline

Now execute the graph with a real question. The returned object contains both intermediate state and final output, which makes debugging much easier than a single opaque LLM call.

result = graph.invoke({
    "question": "What is LangGraph used for?",
    "context": [],
    "answer": ""
})

print("Question:", result["question"])
print("Retrieved context:", result["context"])
print("Answer:", result["answer"])

If you want a cleaner API for downstream services, wrap the graph call in a function that returns only the answer.

def ask(question: str) -> str:
    result = graph.invoke({"question": question, "context": [], "answer": ""})
    return result["answer"]

print(ask("Why do we chunk documents before embedding?"))

Testing It

Run the script and ask questions that are clearly covered by your sample corpus. You should see retrieved chunks that match the question and an answer grounded in those chunks.

Try an out-of-scope question like “What is the capital of France?” The model should either refuse to guess or say the context is insufficient if your prompt is doing its job.

If retrieval looks weak, reduce chunk size or increase k in search_kwargs. If answers are too verbose or drift off-topic, tighten the system prompt and keep temperature at zero.

Next Steps

•Add citations by returning document metadata alongside each chunk.
•Insert a reranker node between retrieval and generation for better relevance.
•Extend the state with conversation history so follow-up questions work across turns.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit