LangChain Tutorial (Python): parsing structured output for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
langchainparsing-structured-output-for-intermediate-developerspython

This tutorial shows you how to get a language model to return structured data in Python using LangChain, then parse that output into a typed schema you can trust in downstream code. You need this when you stop treating LLM output like free-form text and start feeding it into workflows, APIs, databases, or validation layers.

What You'll Need

  • Python 3.10+
  • A working OpenAI API key
  • langchain
  • langchain-openai
  • pydantic
  • Basic familiarity with ChatPromptTemplate and ChatOpenAI

Install the packages:

pip install langchain langchain-openai pydantic

Set your API key:

export OPENAI_API_KEY="your-api-key"

Step-by-Step

  1. Start by defining the shape of the output you want. Use a Pydantic model so your parser has an explicit contract instead of guessing from text.
from pydantic import BaseModel, Field

class SupportTicket(BaseModel):
    customer_name: str = Field(description="Full name of the customer")
    priority: str = Field(description="Priority level: low, medium, or high")
    issue_summary: str = Field(description="Short summary of the issue")
    refund_requested: bool = Field(description="Whether the customer asked for a refund")
  1. Build a prompt that tells the model to return only structured data. LangChain’s parser can inject format instructions directly into the prompt, which reduces malformed output.
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import PydanticOutputParser

parser = PydanticOutputParser(pydantic_object=SupportTicket)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You extract structured data from customer messages."),
    ("user", "Message: {message}\n{format_instructions}")
]).partial(format_instructions=parser.get_format_instructions())
  1. Wire up the chat model and chain it with the parser. The important part is that the model returns text, then the parser converts that text into a real SupportTicket object.
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt | llm | parser

message = (
    "Hi, I'm Jordan Lee. My order arrived damaged and I want a refund. "
    "This is urgent because I need it before Friday."
)

ticket = chain.invoke({"message": message})
print(ticket)
print(ticket.customer_name)
  1. Handle parsing failures explicitly. In production, malformed output happens, so you should catch exceptions and decide whether to retry, log, or route to a fallback path.
from langchain_core.exceptions import OutputParserException

bad_message = "The package was broken and I need help."

try:
    parsed = chain.invoke({"message": bad_message})
    print(parsed.model_dump())
except OutputParserException as e:
    print("Parsing failed:")
    print(str(e))
  1. If you want more control over validation, keep business rules in your schema. For example, you can constrain fields with literals or add custom validators when “priority” must match a fixed set.
from typing import Literal

class StrictSupportTicket(BaseModel):
    customer_name: str
    priority: Literal["low", "medium", "high"]
    issue_summary: str
    refund_requested: bool

strict_parser = PydanticOutputParser(pydantic_object=StrictSupportTicket)

strict_prompt = ChatPromptTemplate.from_messages([
    ("system", "Extract support ticket data exactly."),
    ("user", "{message}\n{format_instructions}")
]).partial(format_instructions=strict_parser.get_format_instructions())

strict_chain = strict_prompt | llm | strict_parser
result = strict_chain.invoke({
    "message": "I'm Ava Chen. Priority is high. My item broke and I want a refund."
})
print(result.priority)

Testing It

Run the script with a few different customer messages and inspect both valid parses and failures. You want to confirm that required fields are always present, booleans are parsed correctly, and invalid priorities are rejected by the schema.

Try edge cases like missing names, ambiguous refund requests, or messages with multiple issues in one paragraph. If you’re using this in an app, add unit tests around the Pydantic model first, then integration tests around the LangChain chain second.

A good smoke test is to call .model_dump() on the parsed object and verify it matches what your downstream code expects. If your pipeline depends on exact field names or enums, treat schema changes like breaking API changes.

Next Steps

  • Learn with_structured_output() for cases where you want LangChain to handle structured generation more directly.
  • Add retries with fallback parsers for noisy inputs and production-grade resilience.
  • Combine structured parsing with tool calling when extracted fields need to trigger actions like ticket creation or CRM updates.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides