LangGraph Tutorial (Python): parsing structured output for beginners

By Cyprian AaronsUpdated 2026-04-22
langgraphparsing-structured-output-for-beginnerspython

This tutorial shows you how to build a small LangGraph workflow in Python that takes messy model text and turns it into structured data you can reliably use in code. You need this when you want an LLM to return fields like name, priority, or summary without manually parsing free-form text.

What You'll Need

  • Python 3.10+
  • langgraph
  • langchain-core
  • pydantic
  • An OpenAI API key if you want to swap the mock model for a real one later
  • Basic familiarity with LangGraph nodes, edges, and state

Install the packages:

pip install langgraph langchain-core pydantic

Step-by-Step

  1. Start by defining the structure you want from the model.
    For beginners, Pydantic is the easiest way to make output shape explicit and validate it before your graph continues.
from typing import TypedDict, Optional
from pydantic import BaseModel, Field

class TicketInfo(BaseModel):
    customer_name: str = Field(description="Name of the customer")
    issue_type: str = Field(description="Type of issue")
    priority: str = Field(description="Priority level: low, medium, or high")
    summary: str = Field(description="Short summary of the issue")

class GraphState(TypedDict):
    raw_text: str
    parsed: Optional[TicketInfo]
  1. Create a parser node that converts raw text into structured data.
    In production you would usually call an LLM here, but for a beginner-friendly tutorial we’ll parse a fixed text format so the graph is executable as-is.
import re

def parse_ticket(state: GraphState) -> dict:
    text = state["raw_text"]

    def grab(label: str) -> str:
        match = re.search(rf"{label}:\s*(.+)", text)
        return match.group(1).strip() if match else ""

    parsed = TicketInfo(
        customer_name=grab("Customer"),
        issue_type=grab("Issue"),
        priority=grab("Priority"),
        summary=grab("Summary"),
    )
    return {"parsed": parsed}
  1. Add a validation node so bad output fails early.
    This is the part people skip when they first use structured output, and it’s where broken workflows usually start.
def validate_ticket(state: GraphState) -> dict:
    parsed = state["parsed"]
    if parsed is None:
        raise ValueError("No parsed ticket found")

    if parsed.priority not in {"low", "medium", "high"}:
        raise ValueError(f"Invalid priority: {parsed.priority}")

    return {}
  1. Build the LangGraph workflow and connect the nodes.
    The graph starts with raw text, parses it, validates it, then returns the final structured object.
from langgraph.graph import StateGraph, START, END

builder = StateGraph(GraphState)
builder.add_node("parse_ticket", parse_ticket)
builder.add_node("validate_ticket", validate_ticket)

builder.add_edge(START, "parse_ticket")
builder.add_edge("parse_ticket", "validate_ticket")
builder.add_edge("validate_ticket", END)

graph = builder.compile()
  1. Run the graph with sample input and inspect the result.
    If everything is wired correctly, you should get a typed Pydantic object back instead of an unstructured string.
input_state: GraphState = {
    "raw_text": (
        "Customer: Jane Doe\n"
        "Issue: login failure\n"
        "Priority: high\n"
        "Summary: User cannot sign in after password reset."
    ),
    "parsed": None,
}

result = graph.invoke(input_state)
print(result["parsed"])
print(result["parsed"].model_dump())

Testing It

Run the script and confirm that result["parsed"] prints a TicketInfo object with all four fields populated. Then change Priority: high to something invalid like urgent and verify that validate_ticket raises a ValueError.

You should also test missing fields. If one of the labels is absent from the input text, your parser will produce an empty string, which is useful for catching incomplete upstream data before it hits downstream systems.

If you want to make this closer to production behavior, replace parse_ticket() with an LLM node that returns JSON and then feed that JSON into TicketInfo.model_validate(...). The graph pattern stays the same; only the parsing implementation changes.

Next Steps

  • Replace the regex parser with an actual chat model using structured output or JSON mode
  • Add conditional edges so invalid parses go to a retry node instead of failing immediately
  • Store validated objects in a database or queue for downstream automation

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides