How to Fix 'JSON parsing error in production' in LlamaIndex (Python)
What this error usually means
JSON parsing error in production in LlamaIndex almost always means one of your components expected structured JSON, but the model returned something else: extra prose, malformed JSON, truncated output, or a tool call payload that didn’t match the schema.
You’ll typically see it when using StructuredOutputParser, PydanticProgramExtractor, function calling, or any agent workflow that depends on strict JSON from the LLM.
The Most Common Cause
The #1 cause is asking the model for JSON without forcing a strict structured-output path. In practice, people prompt for “return JSON” and then parse the raw text with json.loads(), or they use a plain LLM.predict() call where the model adds markdown fences, explanations, or trailing commas.
Here’s the broken pattern and the fixed pattern side by side:
| Broken | Fixed |
|---|---|
| ```python | |
| from llama_index.llms.openai import OpenAI | |
| import json |
llm = OpenAI(model="gpt-4o-mini")
prompt = """ Extract customer details as JSON: name, policy_number, claim_amount """
response = llm.complete(prompt)
data = json.loads(response.text) # JSONDecodeError in production
|python
from pydantic import BaseModel
from llama_index.core.program import LLMTextCompletionProgram
from llama_index.llms.openai import OpenAI
class CustomerClaim(BaseModel): name: str policy_number: str claim_amount: float
llm = OpenAI(model="gpt-4o-mini")
program = LLMTextCompletionProgram.from_defaults( output_cls=CustomerClaim, prompt_template_str=( "Extract customer details from this text:\n" "{input}\n" ), llm=llm, )
result = program(input="Jane Doe, policy number P-1234, claim amount 1200.50") print(result.model_dump())
The broken version fails because `response.text` is not guaranteed to be pure JSON. In production you’ll often see errors like:
- `json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)`
- `ValueError: Invalid JSON format`
- `ValidationError` from Pydantic after partial parsing
If you need structured output, use LlamaIndex’s schema-backed APIs instead of hand-parsing raw text.
## Other Possible Causes
### 1) The model returned markdown fences or extra text
This is common when your prompt says “return only JSON,” but the model still wraps it in code fences.
```python
raw = """
```json
{"name":"Jane Doe","policy_number":"P-1234","claim_amount":1200.5}
"""
Fix by stripping fences before parsing, or better, use structured output classes so you never parse raw text manually.
### 2) Truncated responses from token limits
If the response gets cut off mid-object, parsing fails immediately.
```python
from llama_index.core import Settings
Settings.llm.max_tokens = 64 # too low for your schema-rich output
Typical symptom:
- •
JSONDecodeError: Unterminated string starting at... - •Output ends halfway through an object
Increase token budget or reduce schema size. If you’re using agents with multi-step tool calls, watch cumulative context growth.
3) Tool/function schema mismatch
When using tool calling with FunctionTool, OpenAIAgent, or similar classes, the function signature must match what the runtime expects.
from llama_index.core.tools import FunctionTool
def create_claim(name: str, policy_number: str):
return {"ok": True}
tool = FunctionTool.from_defaults(fn=create_claim)
If your function returns non-serializable objects like dataclasses, custom classes without .dict()/.model_dump(), or bytes, downstream JSON serialization can fail.
4) Provider-specific response formatting issues
Some providers are stricter than others. A config that works with one model may fail with another because of different tool-call formatting or unsupported structured-output behavior.
from llama_index.llms.anthropic import Anthropic
llm = Anthropic(model="claude-3-5-sonnet-latest")
If you switch providers and start seeing parsing failures, check whether that provider supports the exact structured-output mode you’re using in LlamaIndex.
How to Debug It
- •
Print the raw model output before parsing
- •Log
response.textor the agent/tool payload. - •Look for markdown fences, commentary, missing braces, or truncation.
- •Log
- •
Check which LlamaIndex class is failing
- •If it’s
LLMTextCompletionProgram, inspect the prompt and schema. - •If it’s
PydanticProgramExtractor, verify the source text contains enough signal. - •If it’s an agent path like
OpenAIAgent, inspect tool call arguments and returned values.
- •If it’s
- •
Validate against a minimal schema
- •Replace your full Pydantic model with two fields.
- •If that works, your original schema is too large or ambiguous.
- •
Run the same input outside production
- •Replay one failing request locally with identical prompt and model settings.
- •Compare temperature, max tokens, provider version, and system prompt.
A practical debug loop looks like this:
print("RAW OUTPUT:")
print(response.text)
try:
parsed = MySchema.model_validate_json(response.text)
except Exception as e:
print(type(e).__name__, str(e))
That tells you whether this is a prompt issue, a truncation issue, or a serialization issue.
Prevention
- •Use schema-backed generation paths like
LLMTextCompletionProgramor Pydantic-based extractors instead of manualjson.loads()on free-form completions. - •Keep prompts explicit: say what fields to return, forbid extra prose, and set temperature low for extraction tasks.
- •Add a contract test for every production parser:
- •feed it real sample inputs
- •assert valid JSON/schema compliance
- •fail CI if the model output drifts
If you’re building bank or insurance workflows, treat LLM output like an untrusted integration boundary. Parse defensively, validate strictly, and log raw outputs whenever a parse step fails.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit