AutoGen Tutorial (Python): parsing structured output for beginners

By Cyprian AaronsUpdated 2026-04-21
autogenparsing-structured-output-for-beginnerspython

This tutorial shows how to make an AutoGen agent return structured data you can reliably parse in Python. You need this when free-form text is too brittle for downstream code, and you want predictable fields like name, amount, or risk_level instead of scraping paragraphs.

What You'll Need

  • Python 3.10+
  • autogen-agentchat
  • autogen-ext
  • pydantic
  • An OpenAI API key set as OPENAI_API_KEY
  • Basic familiarity with AutoGen agents and model clients

Install the packages:

pip install autogen-agentchat autogen-ext pydantic

Step-by-Step

  1. Start by defining the shape of the output you want. For beginners, Pydantic is the cleanest way to validate and parse structured responses because it gives you typed fields and clear errors.
from pydantic import BaseModel, Field


class ClaimSummary(BaseModel):
    claim_id: str = Field(..., description="Unique claim identifier")
    customer_name: str = Field(..., description="Full name of the customer")
    amount_usd: float = Field(..., description="Claim amount in USD")
    risk_level: str = Field(..., description="Low, medium, or high risk")
  1. Next, create a model client and a simple assistant agent. This example uses OpenAI through AutoGen's chat-completion client, which is enough for structured output parsing with beginners' workflows.
import asyncio
import os

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient


client = OpenAIChatCompletionClient(
    model="gpt-4o-mini",
    api_key=os.environ["OPENAI_API_KEY"],
)

agent = AssistantAgent(
    name="claims_agent",
    model_client=client,
    system_message="You extract claim data and return only valid JSON that matches the requested schema.",
)
  1. Now ask the agent for structured output and parse it into your Pydantic model. The key idea is to tell the model exactly what schema you want, then validate the response before your app uses it.
async def main():
    prompt = """
Extract this insurance claim into JSON:

Claim ID: CLM-10291
Customer: Sarah Johnson
Amount: 1840.50 USD
Risk: medium
"""

    result = await agent.run(task=prompt)

    raw_text = result.messages[-1].content
    print("RAW RESPONSE:")
    print(raw_text)

    parsed = ClaimSummary.model_validate_json(raw_text)
    print("\nPARSED OBJECT:")
    print(parsed)
  1. Wrap it in a runnable script so you can test it locally. If the model returns invalid JSON or misses a field, Pydantic will fail fast instead of letting bad data slip into your pipeline.
if __name__ == "__main__":
    asyncio.run(main())
  1. If you want more reliability, add a small retry loop that re-prompts on validation failure. In production systems, this is usually better than trusting one shot from the model.
async def extract_claim(prompt: str) -> ClaimSummary:
    result = await agent.run(task=prompt)
    raw_text = result.messages[-1].content

    try:
        return ClaimSummary.model_validate_json(raw_text)
    except Exception as e:
        repair_prompt = f"""
The previous response was invalid JSON or did not match the schema.
Return ONLY valid JSON for this input:

{prompt}

Error: {e}
"""
        retry_result = await agent.run(task=repair_prompt)
        retry_text = retry_result.messages[-1].content
        return ClaimSummary.model_validate_json(retry_text)

Testing It

Run the script with your API key set in the environment. If everything is wired correctly, you should see two outputs: the raw model response and the parsed ClaimSummary object.

Check that the raw response is valid JSON and that each field maps to the right type. If parsing fails, inspect whether the model added markdown fences like ```json or extra commentary around the payload.

A good test is to change one field in the prompt, such as making Amount a string or omitting Risk, and confirm that Pydantic catches it. That tells you your validation layer is doing real work instead of just formatting strings.

Next Steps

  • Learn AutoGen tool calling so agents can fetch data before structuring it.
  • Add stricter schemas with enums for fields like risk_level.
  • Store parsed objects directly in your database layer instead of passing raw text around.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides