How to Fix 'JSON parsing error when scaling' in LlamaIndex (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
json-parsing-error-when-scalingllamaindextypescript

When you see JSON parsing error when scaling in LlamaIndex TypeScript, it usually means an LLM output that was supposed to be structured JSON came back malformed during a batch or map/reduce-style operation. In practice, this shows up when ResponseSynthesizer, structured output parsers, or tool/agent workflows try to scale across multiple chunks and one bad completion breaks the whole pipeline.

The root issue is almost always the same: you asked for JSON, but the model returned something that is not valid JSON. That can be a trailing comma, markdown fences, extra commentary, truncated output, or a schema mismatch.

The Most Common Cause

The #1 cause is using an LLM prompt that encourages free-form text while a downstream LlamaIndex component expects strict JSON. This happens a lot with StructuredOutputParser, PydanticOutputParser, QueryEngine, or agent steps that call JSON.parse() internally.

Here’s the broken pattern versus the fixed pattern.

BrokenFixed
Prompt says “return JSON” but doesn’t enforce itUse explicit schema instructions and parse guards
LLM returns fenced markdown or proseForce raw JSON only
No validation before parsingValidate and retry on parse failure
// BROKEN: output is expected to be JSON, but the prompt is loose
import { OpenAI } from "llamaindex";

const llm = new OpenAI({ model: "gpt-4o-mini" });

const prompt = `
Summarize the policy into JSON with keys: riskLevel, summary.
`;

const response = await llm.complete(prompt);

// This often fails with:
// SyntaxError: Unexpected token '`', "```json ..." is not valid JSON
const data = JSON.parse(response.text);
console.log(data);
// FIXED: force strict JSON and validate before parsing
import { OpenAI } from "llamaindex";

const llm = new OpenAI({
  model: "gpt-4o-mini",
  temperature: 0,
});

const prompt = `
Return ONLY valid JSON.
No markdown fences.
No extra text.

Schema:
{
  "riskLevel": "low" | "medium" | "high",
  "summary": string
}
`;

const response = await llm.complete(prompt);

const raw = response.text.trim();
if (!raw.startsWith("{")) {
  throw new Error(`Expected raw JSON, got: ${raw.slice(0, 80)}`);
}

const data = JSON.parse(raw);
console.log(data);

If you are using a parser class like PydanticOutputParser or StructuredOutputParser, make sure the format instructions are actually injected into the final prompt. A lot of teams build the parser but forget to pass its instructions into the query template.

Other Possible Causes

1. Chunking creates partial or truncated JSON

This shows up during scaling because one chunk response gets cut off before closing braces. You’ll often see:

  • SyntaxError: Unexpected end of JSON input
  • JSON parsing error
  • Failed to parse model output
// Example: too much content requested in one shot
const response = await llm.complete(`
Extract all entities and relationships from this long document as JSON.
`);

// Truncated output can break parse here
JSON.parse(response.text);

Fix it by reducing per-call scope or forcing smaller outputs.

// Better: smaller extraction scope
const response = await llm.complete(`
Extract only the top 5 entities as strict JSON.
Return no more than 200 tokens.
`);

2. Markdown fences around JSON

Models often wrap outputs in ```json fences even when you ask for plain JSON. That breaks direct parsing unless you strip them first.

const raw = response.text.replace(/^```json\s*/i, "").replace(/```$/i, "");
const data = JSON.parse(raw);

If your pipeline uses ResponseSynthesizer, inspect whether your custom prompt is encouraging fenced code blocks.

3. Schema mismatch in structured output

If your code expects { score: number } but the model returns { score: "85%" }, parsing may fail depending on your validation layer.

type Result = {
  score: number;
};

const parsed: Result = JSON.parse(response.text);
// Fails later if score is a string instead of a number

Use runtime validation before trusting the object:

import { z } from "zod";

const ResultSchema = z.object({
  score: z.number(),
});

const parsed = ResultSchema.parse(JSON.parse(response.text));

4. Parallel calls returning inconsistent formats

When scaling across many documents, one worker may get clean JSON and another may get explanatory text. This is common in map/reduce pipelines where each map step is independently prompted.

// One bad worker response can fail the batch
const results = await Promise.all(chunks.map((chunk) => processChunk(chunk)));

Make each worker use identical format instructions and lower temperature:

const llm = new OpenAI({ model: "gpt-4o-mini", temperature: 0 });

How to Debug It

  1. Log the raw model output before parsing

    • Don’t inspect only the exception.
    • Print response.text exactly as returned.
    • Look for fences, commentary, truncation, or weird quotes.
  2. Check which class is failing

    • If it fails inside StructuredOutputParser, your prompt/schema path is wrong.
    • If it fails inside a custom JSON.parse, your own extraction code is too strict.
    • If it fails in ResponseSynthesizer or an agent step, inspect the intermediate prompt template.
  3. Reproduce with one chunk

    • Run the same logic against a single short input.
    • If single-chunk works but batch mode fails, you likely have truncation or inconsistency under scale.
    • Reduce concurrency and compare outputs.
  4. Validate against a schema

    • Use Zod or similar before downstream consumption.
    • This tells you whether the issue is invalid syntax or just wrong shape.
    • Separate “not valid JSON” from “valid JSON but invalid structure.”

Prevention

  • Set temperature: 0 for any workflow that expects machine-readable output.
  • Always inject explicit format instructions when using parsers like StructuredOutputParser or custom schema prompts.
  • Validate every structured response with Zod before passing it deeper into your pipeline.

If you’re building agents or document pipelines in LlamaIndex TypeScript, treat model output as untrusted input. The fix is usually not “better parsing” — it’s tighter prompting, stricter schemas, and fewer assumptions about what comes back from the model.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides