How to Fix 'JSON parsing error when scaling' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21
json-parsing-error-when-scalingllamaindexpython

When LlamaIndex throws a JSON parsing error when scaling, it usually means the framework tried to parse structured output from an LLM, tool, or node payload and got back malformed JSON. In practice, this shows up during response synthesis, structured extraction, agent tool calls, or when you scale from a single test prompt to batch processing or larger documents.

The key point: this is rarely a “JSON is broken” problem. It’s usually a prompt, model-output, or schema mismatch problem that only becomes visible once the workload gets bigger.

The Most Common Cause

The #1 cause is asking an LLM to return JSON but not constraining the output tightly enough. LlamaIndex then tries to parse something that looks like JSON but contains markdown fences, extra commentary, trailing commas, or truncated content.

Typical failure pattern:

  • ValueError: Could not parse output as JSON
  • json.decoder.JSONDecodeError: Expecting value
  • ResponseValidationError when using structured outputs

Broken vs fixed pattern

Broken codeFixed code
```python
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o-mini") Settings.llm = llm

prompt = """ Return JSON for this customer: Name: Alice Balance: 1200 """ resp = llm.complete(prompt) data = resp.text # later parsed with json.loads(...) |python import json from pydantic import BaseModel from llama_index.core import Settings from llama_index.llms.openai import OpenAI

class Customer(BaseModel): name: str balance: int

llm = OpenAI(model="gpt-4o-mini", temperature=0) Settings.llm = llm

prompt = """ Return ONLY valid JSON matching this schema: {"name": string, "balance": number}

Customer: Name: Alice Balance: 1200 """

resp = llm.complete(prompt) data = json.loads(resp.text) customer = Customer.model_validate(data)


If you’re using LlamaIndex structured prediction APIs, prefer schema-first output instead of free-form text. For example:

```python
from pydantic import BaseModel
from llama_index.core.program import PydanticProgram

class Customer(BaseModel):
    name: str
    balance: int

program = PydanticProgram.from_defaults(
    output_cls=Customer,
    prompt_template_str="Extract customer data from: {text}",
)

result = program(text="Name: Alice. Balance: 1200.")

That removes most parsing drift because the model is guided toward a typed object instead of raw text.

Other Possible Causes

1) The model is returning markdown fences or extra prose

This is common when the prompt says “return JSON” but doesn’t forbid explanation.

# Problematic output:
# ```json
# {"name": "Alice", "balance": 1200}
# ```

Fix by forcing strict output:

prompt = """
Return ONLY raw JSON.
No markdown.
No explanation.
No code fences.
"""

If you still see fenced output, strip it before parsing:

text = resp.text.strip()
text = text.removeprefix("```json").removesuffix("```").strip()

2) Temperature is too high for structured extraction

Higher temperature increases formatting variance. That’s fine for creative writing and bad for parsers.

llm = OpenAI(model="gpt-4o-mini", temperature=0)

If you’re doing extraction, classification, or routing, keep temperature at 0 or very close to it.

3) Context window truncation during scaling

When you scale from one document to many chunks, the model may truncate its response or lose part of the schema. That often produces errors like:

  • json.decoder.JSONDecodeError: Unterminated string starting at
  • Expecting ',' delimiter
  • Partial object output missing closing braces

Mitigation:

from llama_index.core import PromptTemplate

template = PromptTemplate(
    "Extract fields from the text below.\n"
    "Return compact JSON only.\n\n"
    "{text}"
)

Also reduce chunk size and avoid stuffing too much source text into one call.

4) Tool calling / function schema mismatch

If you’re using agents and tools, the tool signature must match what the model emits. A mismatch between expected arguments and actual keys can surface as parsing failures.

def create_claim(claim_id: str, amount: float):
    return {"claim_id": claim_id, "amount": amount}

Bad tool descriptions often confuse the model into emitting wrong keys:

# Bad: vague description leads to wrong args in tool call payloads
Tool.from_defaults(fn=create_claim, description="Create a claim somehow")

Use explicit parameter names and descriptions so the tool-call payload matches exactly.

How to Debug It

  1. Print the raw model output before parsing

    • Don’t inspect only the final exception.
    • Log resp.text or the agent/tool payload exactly as returned.
    print(repr(resp.text))
    
  2. Validate against a schema outside LlamaIndex

    • Use json.loads() first.
    • Then validate with Pydantic.
    • This tells you whether the issue is malformed JSON or schema mismatch.
    import json
    data = json.loads(resp.text)
    
  3. Check whether the failure happens only on larger inputs

    • If small prompts work and large batches fail, suspect truncation.
    • Reduce chunk size and inspect one failing chunk directly.
  4. Turn off variability

    • Set temperature=0.
    • Remove chat history.
    • Disable retries temporarily so you can see the first bad payload instead of a masked failure.

Prevention

  • Use schema-first generation for anything that must be parsed later.
    • Prefer PydanticProgram, structured prediction, or explicit response schemas over free-form completion text.
  • Keep extraction prompts strict.
    • Say “ONLY valid JSON” and forbid markdown fences, explanations, and trailing text.
  • Log raw outputs in staging.
    • Store failed completions so you can inspect exact malformed payloads when scaling breaks.

If this error appears only after moving from a single request to batch jobs or multi-document ingestion, assume it’s an output-shape problem first. In LlamaIndex workflows, that’s usually where JSON parsing failures start.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides