LangChain Tutorial (Python): parsing structured output for beginners

By Cyprian AaronsUpdated 2026-04-21
langchainparsing-structured-output-for-beginnerspython

This tutorial shows you how to make a LangChain chat model return structured data you can reliably parse in Python. You need this when plain text is too messy for downstream code and you want fields like name, email, or risk_level instead of scraping prose.

What You'll Need

  • Python 3.10+
  • A virtual environment
  • These packages:
    • langchain
    • langchain-openai
    • pydantic
    • python-dotenv
  • An OpenAI API key set as OPENAI_API_KEY
  • Basic familiarity with LangChain chat models and prompts

Install the dependencies:

pip install langchain langchain-openai pydantic python-dotenv

Step-by-Step

  1. Start by defining the structure you want back from the model. For beginners, Pydantic is the cleanest way to enforce field names and types.
from pydantic import BaseModel, Field

class CustomerLead(BaseModel):
    name: str = Field(description="Full name of the lead")
    email: str = Field(description="Email address")
    company: str = Field(description="Company name")
    budget_usd: int = Field(description="Estimated budget in USD")
  1. Next, load your API key and create a chat model. Use a model that supports structured output well; the example below uses OpenAI through LangChain.
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI

load_dotenv()

if not os.getenv("OPENAI_API_KEY"):
    raise ValueError("OPENAI_API_KEY is not set")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
  1. Now wrap the model with with_structured_output(). This is the key step: LangChain will ask the model to produce output that matches your schema and parse it into your Pydantic object.
structured_llm = llm.with_structured_output(CustomerLead)

result = structured_llm.invoke(
    "Extract lead details from this note: "
    "Jane Doe from Acme Corp can be reached at jane.doe@acme.com. "
    "They are evaluating our platform and have a budget of 15000 USD."
)

print(result)
print(type(result))
  1. If you want to use a prompt, keep it explicit about what should be extracted. This helps when your input text is noisy or contains extra context.
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "Extract customer lead information from the user text."),
    ("user", "{text}")
])

chain = prompt | structured_llm

lead = chain.invoke({
    "text": "Mark Spencer at Blue River Labs, mark@blueriver.io, budget around 25000 USD."
})

print(lead.name)
print(lead.email)
print(lead.company)
print(lead.budget_usd)
  1. In real apps, validate the parsed object before using it downstream. Pydantic already gives you typed fields, so bad outputs fail early instead of leaking into your database or CRM.
def save_lead_to_crm(lead: CustomerLead) -> None:
    print(f"Saving {lead.name} <{lead.email}> from {lead.company} with budget ${lead.budget_usd}")

sample_texts = [
    "Sarah Kim at Northstar Insurance, sarah@northstar.com, budget 40000 USD.",
    "Tom Reed from FinEdge can be contacted at tom@finedge.ai with a budget of 12000 USD."
]

for text in sample_texts:
    lead = chain.invoke({"text": text})
    save_lead_to_crm(lead)

Testing It

Run the script and confirm that each call returns a CustomerLead object instead of a raw string. You should see values printed for name, email, company, and budget_usd without manual parsing.

If the model returns malformed data, LangChain will raise an error instead of silently giving you junk. That is what you want in production: fail fast, then retry or log the input for review.

Try changing the input text so one field is missing, like no budget or no email. Then decide whether you want to make fields optional in your schema or keep them required and reject incomplete records.

Next Steps

  • Learn how to use optional fields and nested Pydantic models for more complex extraction jobs.
  • Add retry logic with LangChain output parsers for cases where the model returns invalid data.
  • Combine structured output with tools so your agent can extract data and then act on it directly.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides