LangChain Tutorial (Python): parsing structured output for advanced developers

By Cyprian AaronsUpdated 2026-04-21
langchainparsing-structured-output-for-advanced-developerspython

This tutorial shows how to make a LangChain chat model return structured Python objects instead of brittle free-text. You need this when downstream code depends on exact fields, validation, and predictable parsing for things like claims triage, KYC extraction, or policy document classification.

What You'll Need

  • Python 3.10+
  • An OpenAI API key set as OPENAI_API_KEY
  • langchain
  • langchain-openai
  • pydantic
  • Basic familiarity with LangChain chat models and prompt templates

Install the packages:

pip install langchain langchain-openai pydantic

Step-by-Step

  1. Define the schema first.
    Structured output starts with a strict contract, and Pydantic is the cleanest way to express it for production code.
from pydantic import BaseModel, Field
from typing import Literal

class RiskAssessment(BaseModel):
    customer_name: str = Field(..., description="Full name of the customer")
    risk_level: Literal["low", "medium", "high"]
    reason: str = Field(..., description="Short explanation for the risk score")
  1. Build the model with structured output enabled.
    with_structured_output() tells LangChain to parse the model response into your schema instead of leaving you with raw text.
import os
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    api_key=os.environ["OPENAI_API_KEY"],
)

structured_llm = llm.with_structured_output(RiskAssessment)
  1. Send a prompt that contains enough signal for the parser to work reliably.
    Keep the task narrow and make the output contract explicit; don’t ask for extra prose if you want a clean object back.
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You extract structured risk assessments from customer notes."),
    ("user", "Customer: Jane Doe\nNotes: Multiple failed login attempts from two countries in 24 hours.")
])

chain = prompt | structured_llm
result = chain.invoke({})
print(result)
print(type(result))
  1. Access the parsed object like normal Python data.
    This is the main win: you can pass validated fields straight into business logic without regexes or ad hoc JSON parsing.
assessment = chain.invoke({})

if assessment.risk_level == "high":
    action = "route_to_manual_review"
elif assessment.risk_level == "medium":
    action = "request_more_context"
else:
    action = "auto_approve"

print({
    "customer_name": assessment.customer_name,
    "risk_level": assessment.risk_level,
    "action": action,
})
  1. Add explicit validation and failure handling around malformed outputs.
    In production, treat schema violations as normal control flow and decide whether to retry, fall back, or escalate.
from pydantic import ValidationError

try:
    assessment = chain.invoke({})
    print(assessment.model_dump())
except ValidationError as e:
    print("Schema validation failed:")
    print(e)
except Exception as e:
    print("LLM call failed:")
    print(e)
  1. If you need multiple records, use a list wrapper schema.
    This is common for extraction jobs where one document contains several entities, such as claims, beneficiaries, or transactions.
from typing import List

class BatchRiskAssessments(BaseModel):
    items: List[RiskAssessment]

batch_llm = llm.with_structured_output(BatchRiskAssessments)

batch_prompt = ChatPromptTemplate.from_messages([
    ("system", "Extract all customer risk assessments from the text."),
    ("user", """
Customer: Jane Doe | Notes: Multiple failed login attempts from two countries in 24 hours.
Customer: John Smith | Notes: Normal login pattern and verified device.
""")
])

batch_chain = batch_prompt | batch_llm
batch_result = batch_chain.invoke({})
print(batch_result.items[0].model_dump())

Testing It

Run the script and confirm that print(result) shows a RiskAssessment object, not a plain string or dict full of untrusted text. Then check that each field matches the schema types you defined in Pydantic. If you intentionally change the prompt to be ambiguous, you should see either validation issues or lower-quality output, which is exactly why tight prompts matter.

A good test is to feed in edge cases like missing names, conflicting risk signals, or extra irrelevant text. Your code should still either parse cleanly or fail in a controlled way that your application can handle.

Next Steps

  • Learn PydanticOutputParser for cases where you want manual parsing control instead of model-native structured output.
  • Combine structured output with retries using RunnableRetry when extraction quality matters under noisy inputs.
  • Add JSON schema versioning so downstream services can evolve without breaking old consumers.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides