How to Fix 'output parsing error when scaling' in LlamaIndex (TypeScript)
What this error means
output parsing error when scaling usually means LlamaIndex asked an LLM for structured output, then failed to parse the response into the schema it expected. In TypeScript, this shows up most often when you use structured extraction, query engines with response schemas, or agents that rely on JSON-like output.
The common trigger is scaling from a single happy-path prompt to batch processing, larger documents, or more varied inputs. Once the model starts returning extra text, invalid JSON, or missing fields, OutputParserError gets thrown.
The Most Common Cause
The #1 cause is a mismatch between the schema you asked for and the text the model actually returned.
In LlamaIndex TypeScript, this usually happens when you use an OutputParser, StructuredLLM, or an extraction pipeline and expect strict JSON. The model adds commentary, wraps JSON in markdown fences, or omits required fields.
Broken pattern vs fixed pattern
| Broken | Fixed |
|---|---|
| Assumes the model will always return clean JSON | Forces a strict schema and validates before parsing |
| Lets prompts drift during scale-up | Uses explicit instructions and deterministic settings |
import { OpenAI } from "llamaindex";
import { z } from "zod";
const PersonSchema = z.object({
name: z.string(),
age: z.number(),
});
const llm = new OpenAI({ model: "gpt-4o-mini", temperature: 0 });
const prompt = `
Extract a person from this text and return JSON:
John Doe is 34 years old.
`;
const response = await llm.complete(prompt);
// ❌ Broken: assumes raw text is valid JSON
const parsed = PersonSchema.parse(JSON.parse(response.text));
import { OpenAI } from "llamaindex";
import { z } from "zod";
const PersonSchema = z.object({
name: z.string(),
age: z.number(),
});
const llm = new OpenAI({ model: "gpt-4o-mini", temperature: 0 });
const prompt = `
Return ONLY valid JSON matching this schema:
{
"name": string,
"age": number
}
Text:
John Doe is 34 years old.
`;
const response = await llm.complete(prompt);
// ✅ Better: strip fences and validate before using
const cleaned = response.text
.replace(/```json/g, "")
.replace(/```/g, "")
.trim();
const parsed = PersonSchema.parse(JSON.parse(cleaned));
If you are using a higher-level LlamaIndex parser, the same rule applies: make the schema strict and make the prompt strict. If either side is loose, scaling exposes it.
Other Possible Causes
1) Temperature is too high
Higher temperature increases formatting drift. That means more malformed JSON and more parser failures.
const llm = new OpenAI({
model: "gpt-4o-mini",
temperature: 0.8, // risky for structured output
});
Use deterministic settings for extraction:
const llm = new OpenAI({
model: "gpt-4o-mini",
temperature: 0,
});
2) Your prompt allows natural language around the payload
This is a classic failure mode with ResponseSynthesizer-style flows. The model gives you an explanation first, then the JSON.
const prompt = `
Summarize the claim and include JSON at the end.
`;
Make it explicit:
const prompt = `
Return only valid JSON.
No prose.
No markdown.
No code fences.
`;
3) Your schema is too strict for real data
If your parser expects a number but the source contains "unknown" or "N/A", parsing fails during scale-out.
const ClaimSchema = z.object({
claimId: z.string(),
amount: z.number(), // breaks when source has "$12k" or "unknown"
});
Relax where needed:
const ClaimSchema = z.object({
claimId: z.string(),
amount: z.union([z.number(), z.string()]),
});
Then normalize after parsing.
4) Chunking changes the output shape
When scaling over many documents, chunk boundaries can remove context. The LLM then invents fields or returns partial objects.
// Too aggressive chunking can split one record across chunks
splitter.chunkSize = 256;
splitter.chunkOverlap = 0;
Increase overlap for extraction tasks:
splitter.chunkSize = 1024;
splitter.chunkOverlap = 150;
How to Debug It
- •
Log the raw LLM output
- •Don’t inspect only parsed objects.
- •Print
response.textbeforeJSON.parse()or parser validation.
- •
Reproduce with one input
- •Run the exact failing document through a single-call path.
- •If it works once but fails in batch, chunking or concurrency is probably involved.
- •
Check whether fences or prose are present
- •Look for these patterns:
- •
- •
- •“Here is the result…”
- •
- •If present, your parser needs cleanup logic or your prompt needs tightening.
- •Look for these patterns:
- •
Validate against your schema outside LlamaIndex
- •Use
zod.parse()directly on captured output. - •If Zod fails first, LlamaIndex is not the root cause; your contract is.
- •Use
Prevention
- •Use
temperature: 0for any structured extraction pipeline. - •Treat prompts as contracts:
- •“Return only valid JSON”
- •No markdown
- •No extra text
- •Add a cleanup step before parsing:
function stripCodeFences(text: string) {
return text.replace(/```json/g, "").replace(/```/g, "").trim();
}
- •Prefer tolerant schemas at ingestion time, then normalize downstream.
- •Test with messy real-world inputs before shipping batch jobs or background workers.
If you’re seeing OutputParserError only after scaling up in TypeScript, assume it’s a contract problem first. In practice, that means bad formatting tolerance, over-strict schemas, or prompts that leave too much room for interpretation.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit