LlamaIndex Tutorial (Python): parsing structured output for beginners

By Cyprian AaronsUpdated 2026-04-21
llamaindexparsing-structured-output-for-beginnerspython

This tutorial shows you how to make LlamaIndex return structured Python objects instead of loose text. You need this when you want predictable outputs for downstream code, like extracting invoice fields, support ticket metadata, or insurance claim details.

What You'll Need

  • Python 3.10+
  • llama-index
  • An OpenAI API key
  • A terminal and a virtual environment
  • Basic familiarity with llama_index.core and Settings

Install the package:

pip install llama-index

Set your API key:

export OPENAI_API_KEY="your-api-key"

Step-by-Step

  1. Start by defining the schema you want LLM output to match. For beginners, Pydantic is the cleanest way to describe the shape of the data you expect back.
from pydantic import BaseModel, Field


class InvoiceData(BaseModel):
    vendor_name: str = Field(description="Name of the vendor")
    invoice_number: str = Field(description="Invoice identifier")
    total_amount: float = Field(description="Total amount on the invoice")
    due_date: str = Field(description="Due date in YYYY-MM-DD format")
  1. Next, create a small piece of source text and build an extractor around it. LlamaIndex will use your schema to parse the text into a typed object.
from llama_index.core import Settings
from llama_index.core.program import LLMTextCompletionProgram
from llama_index.llms.openai import OpenAI

Settings.llm = OpenAI(model="gpt-4o-mini")

invoice_text = """
Invoice from Acme Supplies
Invoice Number: INV-1042
Total Due: $189.50
Due Date: 2024-09-30
"""

program = LLMTextCompletionProgram.from_defaults(
    output_cls=InvoiceData,
    prompt_template_str=(
        "Extract the invoice fields from this text.\n"
        "Return only valid structured data.\n\n"
        "Text:\n{input}"
    ),
)
  1. Run the program and inspect the parsed result. You should get a real InvoiceData object, not a blob of JSON string text.
result = program(input=invoice_text)

print(type(result))
print(result)
print(result.vendor_name)
print(result.total_amount)
  1. If you want more control over parsing, use a response schema directly with an index query workflow. This pattern is useful when you already have documents loaded in LlamaIndex and want consistent extraction from retrieved context.
from llama_index.core import VectorStoreIndex, Document

docs = [Document(text=invoice_text)]
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine(
    llm=OpenAI(model="gpt-4o-mini"),
)

response = query_engine.query(
    "Extract vendor name, invoice number, total amount, and due date as structured data."
)

print(response)
  1. For production code, validate the parsed object before using it downstream. That keeps bad model output from slipping into billing pipelines or database writes.
def process_invoice(data: InvoiceData) -> None:
    print(f"Vendor: {data.vendor_name}")
    print(f"Invoice #: {data.invoice_number}")
    print(f"Amount: {data.total_amount}")
    print(f"Due date: {data.due_date}")

parsed = program(input=invoice_text)
process_invoice(parsed)

Testing It

Run the script and confirm that type(result) prints your Pydantic model class, not str or dict. The printed fields should match the values in your source text, with amounts converted to numeric types where possible.

If extraction fails or returns malformed values, tighten the field descriptions in your schema. In practice, better field instructions produce better structured output than longer prompts.

Also test with slightly messy input, like missing labels or extra whitespace. That tells you whether your extraction setup is robust enough for real documents instead of just clean examples.

Next Steps

  • Learn about StructuredLLM in LlamaIndex for stricter schema-bound generation.
  • Add retry logic and validation for partial or invalid parses.
  • Connect this pattern to document loaders for PDFs, emails, or OCR text.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides