How to Fix 'output parsing error when scaling' in LlamaIndex (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

output-parsing-error-when-scalingllamaindextypescript

What this error means

output parsing error when scaling usually means LlamaIndex asked an LLM for structured output, then failed to parse the response into the schema it expected. In TypeScript, this shows up most often when you use structured extraction, query engines with response schemas, or agents that rely on JSON-like output.

The common trigger is scaling from a single happy-path prompt to batch processing, larger documents, or more varied inputs. Once the model starts returning extra text, invalid JSON, or missing fields, OutputParserError gets thrown.

The Most Common Cause

The #1 cause is a mismatch between the schema you asked for and the text the model actually returned.

In LlamaIndex TypeScript, this usually happens when you use an OutputParser, StructuredLLM, or an extraction pipeline and expect strict JSON. The model adds commentary, wraps JSON in markdown fences, or omits required fields.

Broken pattern vs fixed pattern

Broken	Fixed
Assumes the model will always return clean JSON	Forces a strict schema and validates before parsing
Lets prompts drift during scale-up	Uses explicit instructions and deterministic settings

import { OpenAI } from "llamaindex";
import { z } from "zod";

const PersonSchema = z.object({
  name: z.string(),
  age: z.number(),
});

const llm = new OpenAI({ model: "gpt-4o-mini", temperature: 0 });

const prompt = `
Extract a person from this text and return JSON:
John Doe is 34 years old.
`;

const response = await llm.complete(prompt);

// ❌ Broken: assumes raw text is valid JSON
const parsed = PersonSchema.parse(JSON.parse(response.text));

import { OpenAI } from "llamaindex";
import { z } from "zod";

const PersonSchema = z.object({
  name: z.string(),
  age: z.number(),
});

const llm = new OpenAI({ model: "gpt-4o-mini", temperature: 0 });

const prompt = `
Return ONLY valid JSON matching this schema:
{
  "name": string,
  "age": number
}

Text:
John Doe is 34 years old.
`;

const response = await llm.complete(prompt);

// ✅ Better: strip fences and validate before using
const cleaned = response.text
  .replace(/```json/g, "")
  .replace(/```/g, "")
  .trim();

const parsed = PersonSchema.parse(JSON.parse(cleaned));

If you are using a higher-level LlamaIndex parser, the same rule applies: make the schema strict and make the prompt strict. If either side is loose, scaling exposes it.

Other Possible Causes

1) Temperature is too high

Higher temperature increases formatting drift. That means more malformed JSON and more parser failures.

const llm = new OpenAI({
  model: "gpt-4o-mini",
  temperature: 0.8, // risky for structured output
});

Use deterministic settings for extraction:

const llm = new OpenAI({
  model: "gpt-4o-mini",
  temperature: 0,
});

2) Your prompt allows natural language around the payload

This is a classic failure mode with ResponseSynthesizer-style flows. The model gives you an explanation first, then the JSON.

const prompt = `
Summarize the claim and include JSON at the end.
`;

Make it explicit:

const prompt = `
Return only valid JSON.
No prose.
No markdown.
No code fences.
`;

3) Your schema is too strict for real data

If your parser expects a number but the source contains "unknown" or "N/A", parsing fails during scale-out.

const ClaimSchema = z.object({
  claimId: z.string(),
  amount: z.number(), // breaks when source has "$12k" or "unknown"
});

Relax where needed:

const ClaimSchema = z.object({
  claimId: z.string(),
  amount: z.union([z.number(), z.string()]),
});

Then normalize after parsing.

4) Chunking changes the output shape

When scaling over many documents, chunk boundaries can remove context. The LLM then invents fields or returns partial objects.

// Too aggressive chunking can split one record across chunks
splitter.chunkSize = 256;
splitter.chunkOverlap = 0;

Increase overlap for extraction tasks:

splitter.chunkSize = 1024;
splitter.chunkOverlap = 150;

How to Debug It

•
Log the raw LLM output
- •Don’t inspect only parsed objects.
- •Print response.text before JSON.parse() or parser validation.
•
Reproduce with one input
- •Run the exact failing document through a single-call path.
- •If it works once but fails in batch, chunking or concurrency is probably involved.
•
Check whether fences or prose are present
- •
  Look for these patterns:
  - •
  - •
  - •“Here is the result…”
- •If present, your parser needs cleanup logic or your prompt needs tightening.
•
Validate against your schema outside LlamaIndex
- •Use zod.parse() directly on captured output.
- •If Zod fails first, LlamaIndex is not the root cause; your contract is.

Prevention

•Use temperature: 0 for any structured extraction pipeline.
•
Treat prompts as contracts:
- •“Return only valid JSON”
- •No markdown
- •No extra text
•Add a cleanup step before parsing:

function stripCodeFences(text: string) {
  return text.replace(/```json/g, "").replace(/```/g, "").trim();
}

•Prefer tolerant schemas at ingestion time, then normalize downstream.
•Test with messy real-world inputs before shipping batch jobs or background workers.

If you’re seeing OutputParserError only after scaling up in TypeScript, assume it’s a contract problem first. In practice, that means bad formatting tolerance, over-strict schemas, or prompts that leave too much room for interpretation.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit