LangGraph Tutorial (Python): building a RAG pipeline for advanced developers

By Cyprian AaronsUpdated 2026-04-22

langgraphbuilding-a-rag-pipeline-for-advanced-developerspython

This tutorial builds a production-style retrieval-augmented generation pipeline with LangGraph in Python. You’ll wire ingestion, retrieval, answer generation, and conditional fallback into a graph you can extend for real internal search, support, or compliance workflows.

What You'll Need

•Python 3.10+
•langgraph
•langchain
•langchain-openai
•langchain-community
•faiss-cpu
•An OpenAI API key set as OPENAI_API_KEY
•A small text corpus to index locally
•Basic familiarity with LangGraph nodes, edges, and state

Step-by-Step

•Start by installing dependencies and setting up the core models. For this tutorial, we’ll use OpenAI embeddings for retrieval and GPT-4o-mini for answer generation.

pip install langgraph langchain langchain-openai langchain-community faiss-cpu

•Build a local vector store from a few documents. In real systems, this would come from PDFs, SharePoint, Confluence, or a database sync job; the important part is that retrieval is isolated from generation.

from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

docs = [
    Document(page_content="RAG combines retrieval with generation to ground answers in source data."),
    Document(page_content="LangGraph is useful when you need branching logic, retries, or stateful workflows."),
    Document(page_content="Production RAG should filter irrelevant context before calling the LLM.")
]

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(docs, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

•Define the graph state and node functions. The state carries the user question, retrieved context, and final answer; each node updates only what it owns.

from typing import TypedDict, List
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI

class RAGState(TypedDict):
    question: str
    context: List[str]
    answer: str

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def retrieve(state: RAGState) -> dict:
    docs = retriever.invoke(state["question"])
    return {"context": [doc.page_content for doc in docs]}

def generate(state: RAGState) -> dict:
    prompt = (
        "Answer the question using only the context.\n\n"
        f"Question: {state['question']}\n\n"
        f"Context:\n{chr(10).join(state['context'])}"
    )
    response = llm.invoke([HumanMessage(content=prompt)])
    return {"answer": response.content}

•Add a quality gate before generation. If retrieval returns weak context, route to a fallback response instead of forcing an unsupported answer.

def has_useful_context(state: RAGState) -> bool:
    return len(state["context"]) > 0 and any(len(chunk.strip()) > 20 for chunk in state["context"])

def fallback(state: RAGState) -> dict:
    return {
        "answer": "I could not find enough grounded context to answer this reliably."
    }

•Assemble the LangGraph workflow with conditional routing. This gives you a clean DAG now and room to add retries, classification, or human review later.

from langgraph.graph import StateGraph, START, END

builder = StateGraph(RAGState)
builder.add_node("retrieve", retrieve)
builder.add_node("generate", generate)
builder.add_node("fallback", fallback)

builder.add_edge(START, "retrieve")
builder.add_conditional_edges(
    "retrieve",
    has_useful_context,
    {True: "generate", False: "fallback"}
)
builder.add_edge("generate", END)
builder.add_edge("fallback", END)

graph = builder.compile()

•Run the graph with a real question and inspect the output. In an advanced setup you would also log retrieved chunks and latency per node for observability.

result = graph.invoke({"question": "What is LangGraph used for?", "context": [], "answer": ""})
print(result["answer"])
print(result["context"])

Testing It

Run one question that should clearly match your documents and one that should not. The first should produce an answer grounded in retrieved text; the second should hit the fallback path.

Check that context contains actual chunks before generation runs. If you want stricter behavior, lower k, add metadata filters, or require minimum similarity scores before routing forward.

Also verify that changing the corpus changes the answer without touching the prompt logic. That’s the point of RAG: knowledge lives in retrieval infrastructure, not hardcoded prompts.

Next Steps

•Add document chunking with overlap before building the vector store.
•Replace FAISS with a hosted vector database like Pinecone or pgvector.
•Add a grader node that checks whether the generated answer is supported by retrieved context before returning it.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit