CrewAI Tutorial (Python): parsing structured output for advanced developers

By Cyprian AaronsUpdated 2026-04-21
crewaiparsing-structured-output-for-advanced-developerspython

This tutorial shows how to make a CrewAI agent return structured data you can reliably parse in Python, instead of scraping text out of free-form replies. You need this when the output feeds another system — a database, workflow engine, validator, or API contract — and “looks right” is not good enough.

What You'll Need

  • Python 3.10+
  • crewai
  • openai
  • An OpenAI API key set as OPENAI_API_KEY
  • Basic familiarity with CrewAI agents, tasks, and crews
  • A terminal and a virtual environment

Install the packages:

pip install crewai openai

Step-by-Step

  1. Start by defining the exact shape of the output you want. For advanced systems, don’t rely on prompt wording alone; define a schema with Pydantic so the agent’s response can be validated before your app uses it.
from pydantic import BaseModel, Field
from typing import List


class PolicySummary(BaseModel):
    policy_number: str = Field(..., description="Policy identifier")
    customer_name: str = Field(..., description="Insured customer's full name")
    risk_level: str = Field(..., description="Low, medium, or high")
    issues_found: List[str] = Field(default_factory=list)
  1. Create an agent that is explicitly instructed to produce structured output. The key detail is output_json=PolicySummary, which tells CrewAI to format and parse the response into your schema instead of leaving you with raw text.
import os
from crewai import Agent

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY", "")

analyst = Agent(
    role="Insurance policy analyst",
    goal="Extract structured policy review data",
    backstory=(
        "You review policy notes and return only valid structured data "
        "matching the requested schema."
    ),
    verbose=True,
    allow_delegation=False,
    output_json=PolicySummary,
)
  1. Define a task that gives the agent realistic source text to analyze. In production, this would come from an intake form, claims note, call transcript, or document extraction pipeline.
from crewai import Task

policy_text = """
Policy number: POL-88421
Customer: Amina Okafor
Review notes:
- Coverage appears active.
- Address mismatch between application and billing record.
- No recent claims.
"""

task = Task(
    description=(
        "Read the following policy notes and extract the structured summary.\n\n"
        f"{policy_text}"
    ),
    expected_output="A JSON object matching the PolicySummary schema.",
    agent=analyst,
)
  1. Run the crew and parse the result as a typed object. With output_json in place, CrewAI returns data you can convert into your Pydantic model without brittle string parsing.
from crewai import Crew, Process

crew = Crew(
    agents=[analyst],
    tasks=[task],
    process=Process.sequential,
    verbose=True,
)

result = crew.kickoff()

print(type(result))
print(result)
  1. Convert the returned payload into your model and use it safely downstream. This is where structured output pays off: validation happens before anything touches your business logic.
from pprint import pprint

summary = PolicySummary.model_validate(result.json_dict)

pprint(summary.model_dump())

if summary.risk_level.lower() == "high":
    print("Route to senior reviewer")
else:
    print("Auto-process allowed")
  1. If you want stricter control over downstream parsing, add explicit validation logic around required values. This is useful when integrating with underwriting rules or claims triage where malformed output should fail fast.
allowed_risks = {"low", "medium", "high"}

if summary.risk_level.lower() not in allowed_risks:
    raise ValueError(f"Invalid risk level: {summary.risk_level}")

if not summary.policy_number.startswith("POL-"):
    raise ValueError("Unexpected policy number format")

print("Validated structured output is ready for downstream use.")

Testing It

Run the script from your terminal after exporting OPENAI_API_KEY. If everything is wired correctly, CrewAI should print a parsed response that maps cleanly into PolicySummary, not a blob of prose.

Check three things:

  • The returned object includes all required fields
  • summary.model_dump() produces valid Python data
  • Invalid input causes either schema validation failure or business-rule rejection

If parsing fails intermittently, tighten the task wording and keep the schema small. Most production issues come from asking for too many fields at once or leaving ambiguous definitions like “severity” without allowed values.

Next Steps

  • Add nested schemas for line items like claims, beneficiaries, or coverage clauses
  • Use output_pydantic patterns for stricter typed handling across multi-agent workflows
  • Add retries and fallback parsing for malformed model output in production pipelines

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides