How to Fix 'output parsing error in production' in LlamaIndex (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
output-parsing-error-in-productionllamaindextypescript

If you’re seeing output parsing error in production in LlamaIndex TypeScript, it usually means the model returned text that did not match the structured format your parser expected. In practice, this shows up when you use structured outputs, query engines with response schemas, or agent/tool flows that expect strict JSON and the LLM drifts off-format.

The failure is rarely “LlamaIndex is broken.” It’s almost always a mismatch between what your code asked for and what the model actually emitted.

The Most Common Cause

The #1 cause is asking the model for structured output but not constraining it tightly enough, then parsing the raw text as if it were guaranteed JSON.

In TypeScript LlamaIndex apps, this often happens with StructuredOutputParser, ReActAgent, or a custom outputParser. The model returns extra commentary, markdown fences, trailing commas, or a partial object, and you get errors like:

  • Error: Failed to parse output
  • OutputParserError: Could not parse LLM output
  • SyntaxError: Unexpected token when JSON.parse runs under the hood

Here’s the broken pattern versus the fixed one.

Broken patternFixed pattern
Ask for JSON in a prompt, then blindly parse raw textUse a structured output schema and enforce it at the LlamaIndex layer
Let the model “respond in JSON” without examplesProvide a strict format instruction and validate before parsing
Parse whatever comes back from response.message.contentExtract only the structured payload or use a parser designed for it
// BROKEN: prompt says "return JSON", but nothing enforces it.
import { OpenAI } from "@llamaindex/openai";

const llm = new OpenAI({ model: "gpt-4o-mini" });

const prompt = `
Return JSON with keys:
- decision
- reason

User request: ${userInput}
`;

const response = await llm.complete(prompt);

// This blows up when the model adds markdown or extra text.
const parsed = JSON.parse(response.text);
console.log(parsed.decision);
// FIXED: use a strict schema + validation before consuming output.
import { z } from "zod";
import { OpenAI } from "@llamaindex/openai";

const DecisionSchema = z.object({
  decision: z.enum(["approve", "reject"]),
  reason: z.string(),
});

const llm = new OpenAI({ model: "gpt-4o-mini" });

const prompt = `
You are a classifier.
Return ONLY valid JSON matching:
{"decision":"approve|reject","reason":"string"}

User request: ${userInput}
`;

const response = await llm.complete(prompt);

const json = JSON.parse(response.text);
const parsed = DecisionSchema.parse(json);

console.log(parsed.decision);

The important part is not just “parse JSON.” It’s “constrain the generation and validate before trusting it.” If you’re using a LlamaIndex parser class directly, make sure the parser matches the exact shape of the response you asked for.

Other Possible Causes

1) Your prompt allows chain-of-thought style text

If your instructions say “explain your reasoning” and also “return JSON,” many models will mix both.

// Problematic prompt
const prompt = `
Explain your reasoning step by step.
Then return JSON with keys decision and reason.
`;

Fix it by removing any instruction that encourages free-form prose.

const prompt = `
Return ONLY valid JSON:
{"decision":"approve|reject","reason":"string"}
No markdown. No explanation outside JSON.
`;

2) The model hit a context or truncation issue

A truncated response often looks like valid JSON until the last brace disappears. That leads to classic parser failures.

const llm = new OpenAI({
  model: "gpt-4o-mini",
  maxTokens: 80, // too low for long structured responses
});

Raise output tokens and shorten upstream context.

const llm = new OpenAI({
  model: "gpt-4o-mini",
  maxTokens: 300,
});

3) You are parsing tool output instead of final assistant output

In agent flows, tool calls can return intermediate messages that are not meant for your parser. If you attach an output parser to the wrong stage, it will fail on tool chatter.

// Wrong: parsing intermediate agent/tool messages
const result = await agent.chat(userInput);
const parsed = JSON.parse(result.message.content);

Instead, parse only final structured responses or use an agent configuration that returns typed final output.

// Better: ensure you're reading final assistant content only
const result = await agent.chat(userInput);
console.log(result.response); // inspect actual final field in your version

4) Your schema and prompt drifted apart

This happens when someone changes the Zod schema but forgets to update the format instruction.

// Schema expects:
z.object({
  riskLevel: z.enum(["low", "medium", "high"]),
});

// Prompt still says:
`Return {"risk": "..."}`

Keep one source of truth. Generate format instructions from schema where possible.

How to Debug It

  1. Log the raw model output

    • Don’t inspect only parsed objects.
    • Print response.text or equivalent raw content before parsing.
    • Look for markdown fences, commentary, trailing commas, or truncation.
  2. Compare raw output against expected schema

    • If you use Zod, run schema.safeParse(JSON.parse(raw)).
    • If parsing fails before validation, it’s a formatting problem.
    • If validation fails after parsing, it’s a schema mismatch.
  3. Reduce temperature and simplify prompts

    • Set temperature to 0 or close to it.
    • Remove any “be helpful,” “explain,” or “step-by-step” language.
    • Keep only one instruction: return strict structured data.
  4. Test with a known-good input

    • Use a tiny input that should produce an easy response.
    • If it still fails, your parser/prompt setup is wrong.
    • If it only fails on long inputs, you’re probably hitting truncation or context pressure.

Prevention

  • Use schema-first design with Zod or equivalent validation before consuming any LLM output.
  • Keep prompts strict:
    • no extra prose
    • no markdown fences
    • no reasoning unless you explicitly want unstructured text
  • Add production logging for:
    • raw completion text
    • parsed payload
    • validation errors
    • token usage and truncation signals

If this error appears intermittently in production but not locally, treat it as an input-shape problem first. In most LlamaIndex TypeScript systems, that means one of three things: loose prompting, schema drift, or truncation under real traffic.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides