How to Fix 'output parsing error when scaling' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21

output-parsing-error-when-scalingllamaindexpython

What the error means

output parsing error when scaling usually means LlamaIndex tried to parse an LLM response into a structured object and the model returned something that did not match the expected schema. In practice, this shows up when you use an agent, query engine, or structured output parser and the response contains extra text, malformed JSON, or the wrong shape entirely.

It often appears during load or “scaling” scenarios because concurrency makes flaky prompts and inconsistent model output show up more often. The underlying problem is still parsing, not scaling.

The Most Common Cause

The #1 cause is asking an LLM to return structured data, then letting it answer in free-form text.

In LlamaIndex, this usually happens with StructuredOutputParser, PydanticProgram, FunctionCallingProgram, or an agent tool that expects JSON-like output. If the prompt is even slightly loose, you get errors like:

•ValueError: Could not parse output
•OutputParserException: Failed to parse response
•ValidationError from Pydantic
•output parsing error

Broken vs fixed pattern

Broken	Fixed
Model returns prose	Model is forced to emit valid structured output
No schema enforcement	Explicit Pydantic model or function-calling program
Prompt says “return JSON” only	Prompt includes exact field constraints

# BROKEN: free-form text can break parsing
from llama_index.core.llms import ChatMessage
from llama_index.core.program import LLMTextCompletionProgram
from pydantic import BaseModel

class TicketSummary(BaseModel):
    severity: str
    category: str
    summary: str

prompt_template = """
Summarize this support ticket as JSON:
{ticket_text}
"""

# This often fails if the model adds markdown, explanations, or invalid JSON.
program = LLMTextCompletionProgram.from_defaults(
    output_cls=TicketSummary,
    prompt_template_str=prompt_template,
)

result = program(ticket_text="Customer cannot log in after password reset.")

# FIXED: enforce structure with a stricter prompt and parser-friendly output
from llama_index.core.program import LLMTextCompletionProgram
from pydantic import BaseModel, Field

class TicketSummary(BaseModel):
    severity: str = Field(description="low, medium, or high")
    category: str = Field(description="billing, auth, bug, or other")
    summary: str = Field(description="one sentence summary")

prompt_template = """
Return ONLY valid JSON matching this schema:
{
  "severity": "low|medium|high",
  "category": "billing|auth|bug|other",
  "summary": "one sentence"
}

Ticket:
{ticket_text}
"""

program = LLMTextCompletionProgram.from_defaults(
    output_cls=TicketSummary,
    prompt_template_str=prompt_template,
)

result = program(ticket_text="Customer cannot log in after password reset.")

If you are using OpenAI-compatible models, function calling is even better than raw JSON prompting. It removes a lot of parser drift.

Other Possible Causes

1) Your schema is too strict for the model output

A field typed as int will fail if the model returns "3 tickets" instead of 3.

class RiskScore(BaseModel):
    score: int  # too strict if model returns text

Fix it by either tightening the prompt or allowing conversion:

class RiskScore(BaseModel):
    score: int

# Prompt should say:
# "Return an integer only."

2) The model adds markdown fences or commentary

This is common with GPT-style responses:

```json
{"severity":"high","category":"auth","summary":"Login failures after reset"}


That breaks parsers expecting raw JSON. Use a parser-friendly prompt and strip wrappers if needed.

```python
raw = response.text.strip()
raw = raw.removeprefix("```json").removesuffix("```").strip()

Better: avoid needing cleanup at all by using function calling.

3) Tool outputs are not deterministic enough

When an agent tool returns variable text, downstream parsing fails intermittently.

def lookup_customer(customer_id: str) -> str:
    return f"Customer {customer_id}: active=True; risk=low"

If another component expects JSON:

def lookup_customer(customer_id: str) -> dict:
    return {"customer_id": customer_id, "active": True, "risk": "low"}

Make tool outputs typed and stable.

4) You upgraded LlamaIndex and broke parser assumptions

LlamaIndex has moved classes and defaults across versions. A chain that worked on one release can start failing after an upgrade because parsing behavior changed.

Common symptoms:

•ImportError for moved classes
•previously accepted free-form outputs now fail validation
•agent/tool APIs behave differently

Pin versions before debugging logic:

llama-index==0.10.68
pydantic==2.8.2

Then upgrade deliberately.

How to Debug It

•
Print the raw LLM response
- •Do not inspect only the parsed object.
- •Log response.text or equivalent before parsing.
- •Look for markdown fences, extra prose, truncated JSON, or missing fields.
•
Check the exact parser or program class
- •
  Identify whether you are using:
  - •LLMTextCompletionProgram
  - •PydanticProgram
  - •FunctionCallingProgram
  - •an agent tool / query engine wrapper
- •Different classes fail differently.
•
Validate against your schema locally
- •Copy the raw response into a small script.
- •Run it through your Pydantic model directly.
- •If Pydantic fails first, your issue is schema mismatch, not LlamaIndex.
•
Reduce concurrency
- •If it only happens under load, run one request at a time.
- •Scaling exposes nondeterministic completions.
- •If single-threaded works but parallel requests fail, your prompt is unstable or your tool output varies.

Prevention

•
Use structured outputs with explicit schemas.
- •Prefer function calling or strongly typed Pydantic models over “please return JSON” prompts.
•
Keep prompts strict and short.
- •Tell the model exactly which fields to return and what values are allowed.
•
Pin versions of llama-index, pydantic, and your LLM SDK.
- •Parser behavior changes across releases more often than people expect.

If you want one rule to remember: when LlamaIndex says output parsing error, assume the model returned something humans can read but parsers cannot trust. Fix the contract between prompt, schema, and tool output first.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit