How to Fix 'output parsing error when scaling' in LlamaIndex (Python)
What the error means
output parsing error when scaling usually means LlamaIndex tried to parse an LLM response into a structured object and the model returned something that did not match the expected schema. In practice, this shows up when you use an agent, query engine, or structured output parser and the response contains extra text, malformed JSON, or the wrong shape entirely.
It often appears during load or “scaling” scenarios because concurrency makes flaky prompts and inconsistent model output show up more often. The underlying problem is still parsing, not scaling.
The Most Common Cause
The #1 cause is asking an LLM to return structured data, then letting it answer in free-form text.
In LlamaIndex, this usually happens with StructuredOutputParser, PydanticProgram, FunctionCallingProgram, or an agent tool that expects JSON-like output. If the prompt is even slightly loose, you get errors like:
- •
ValueError: Could not parse output - •
OutputParserException: Failed to parse response - •
ValidationErrorfrom Pydantic - •
output parsing error
Broken vs fixed pattern
| Broken | Fixed |
|---|---|
| Model returns prose | Model is forced to emit valid structured output |
| No schema enforcement | Explicit Pydantic model or function-calling program |
| Prompt says “return JSON” only | Prompt includes exact field constraints |
# BROKEN: free-form text can break parsing
from llama_index.core.llms import ChatMessage
from llama_index.core.program import LLMTextCompletionProgram
from pydantic import BaseModel
class TicketSummary(BaseModel):
severity: str
category: str
summary: str
prompt_template = """
Summarize this support ticket as JSON:
{ticket_text}
"""
# This often fails if the model adds markdown, explanations, or invalid JSON.
program = LLMTextCompletionProgram.from_defaults(
output_cls=TicketSummary,
prompt_template_str=prompt_template,
)
result = program(ticket_text="Customer cannot log in after password reset.")
# FIXED: enforce structure with a stricter prompt and parser-friendly output
from llama_index.core.program import LLMTextCompletionProgram
from pydantic import BaseModel, Field
class TicketSummary(BaseModel):
severity: str = Field(description="low, medium, or high")
category: str = Field(description="billing, auth, bug, or other")
summary: str = Field(description="one sentence summary")
prompt_template = """
Return ONLY valid JSON matching this schema:
{
"severity": "low|medium|high",
"category": "billing|auth|bug|other",
"summary": "one sentence"
}
Ticket:
{ticket_text}
"""
program = LLMTextCompletionProgram.from_defaults(
output_cls=TicketSummary,
prompt_template_str=prompt_template,
)
result = program(ticket_text="Customer cannot log in after password reset.")
If you are using OpenAI-compatible models, function calling is even better than raw JSON prompting. It removes a lot of parser drift.
Other Possible Causes
1) Your schema is too strict for the model output
A field typed as int will fail if the model returns "3 tickets" instead of 3.
class RiskScore(BaseModel):
score: int # too strict if model returns text
Fix it by either tightening the prompt or allowing conversion:
class RiskScore(BaseModel):
score: int
# Prompt should say:
# "Return an integer only."
2) The model adds markdown fences or commentary
This is common with GPT-style responses:
```json
{"severity":"high","category":"auth","summary":"Login failures after reset"}
That breaks parsers expecting raw JSON. Use a parser-friendly prompt and strip wrappers if needed.
```python
raw = response.text.strip()
raw = raw.removeprefix("```json").removesuffix("```").strip()
Better: avoid needing cleanup at all by using function calling.
3) Tool outputs are not deterministic enough
When an agent tool returns variable text, downstream parsing fails intermittently.
def lookup_customer(customer_id: str) -> str:
return f"Customer {customer_id}: active=True; risk=low"
If another component expects JSON:
def lookup_customer(customer_id: str) -> dict:
return {"customer_id": customer_id, "active": True, "risk": "low"}
Make tool outputs typed and stable.
4) You upgraded LlamaIndex and broke parser assumptions
LlamaIndex has moved classes and defaults across versions. A chain that worked on one release can start failing after an upgrade because parsing behavior changed.
Common symptoms:
- •
ImportErrorfor moved classes - •previously accepted free-form outputs now fail validation
- •agent/tool APIs behave differently
Pin versions before debugging logic:
llama-index==0.10.68
pydantic==2.8.2
Then upgrade deliberately.
How to Debug It
- •
Print the raw LLM response
- •Do not inspect only the parsed object.
- •Log
response.textor equivalent before parsing. - •Look for markdown fences, extra prose, truncated JSON, or missing fields.
- •
Check the exact parser or program class
- •Identify whether you are using:
- •
LLMTextCompletionProgram - •
PydanticProgram - •
FunctionCallingProgram - •an agent tool / query engine wrapper
- •
- •Different classes fail differently.
- •Identify whether you are using:
- •
Validate against your schema locally
- •Copy the raw response into a small script.
- •Run it through your Pydantic model directly.
- •If Pydantic fails first, your issue is schema mismatch, not LlamaIndex.
- •
Reduce concurrency
- •If it only happens under load, run one request at a time.
- •Scaling exposes nondeterministic completions.
- •If single-threaded works but parallel requests fail, your prompt is unstable or your tool output varies.
Prevention
- •Use structured outputs with explicit schemas.
- •Prefer function calling or strongly typed Pydantic models over “please return JSON” prompts.
- •Keep prompts strict and short.
- •Tell the model exactly which fields to return and what values are allowed.
- •Pin versions of
llama-index,pydantic, and your LLM SDK.- •Parser behavior changes across releases more often than people expect.
If you want one rule to remember: when LlamaIndex says output parsing error, assume the model returned something humans can read but parsers cannot trust. Fix the contract between prompt, schema, and tool output first.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit