How to Fix 'output parsing error when scaling' in AutoGen (Python)

By Cyprian AaronsUpdated 2026-04-21
output-parsing-error-when-scalingautogenpython

What this error means

output parsing error when scaling in AutoGen usually means one agent returned text that the framework expected to parse as structured output, but the response did not match the schema or format required by the downstream agent. You’ll typically see this when using AssistantAgent, UserProxyAgent, or a custom reply function with tool calls, JSON output, or nested agents.

In practice, this shows up during multi-agent orchestration, especially when you add stricter prompts, function calling, or a GroupChatManager that expects consistent message shapes.

The Most Common Cause

The #1 cause is a mismatch between what your prompt asks for and what AutoGen is trying to parse.

If you tell an agent to return JSON, then later feed that output into code expecting valid JSON, one extra sentence or malformed quote is enough to trigger parsing failures. The error often surfaces as something like:

  • ValueError: Failed to parse model output
  • output parsing error when scaling
  • JSONDecodeError: Expecting value
  • autogen.exception.InvalidChatFormat

Broken vs fixed pattern

BrokenFixed
Free-form assistant outputStrictly constrained structured output
Downstream parser assumes JSONOutput validated before parsing
Prompt says “return JSON” but no enforcementPrompt + parser + fallback handling
# BROKEN
import json
from autogen import AssistantAgent, UserProxyAgent

assistant = AssistantAgent(
    name="assistant",
    llm_config={"config_list": [{"model": "gpt-4o-mini", "api_key": "YOUR_KEY"}]},
)

user_proxy = UserProxyAgent(name="user")

msg = user_proxy.initiate_chat(
    assistant,
    message="Return customer risk data as JSON only."
)

# This breaks if the model adds markdown fences or extra text.
data = json.loads(msg.chat_history[-1]["content"])
print(data["risk_score"])
# FIXED
import json
from autogen import AssistantAgent, UserProxyAgent

assistant = AssistantAgent(
    name="assistant",
    llm_config={"config_list": [{"model": "gpt-4o-mini", "api_key": "YOUR_KEY"}]},
)

user_proxy = UserProxyAgent(name="user")

msg = user_proxy.initiate_chat(
    assistant,
    message=(
        "Return ONLY valid JSON with keys: risk_score (int), reason (string). "
        "No markdown, no commentary."
    ),
)

raw = msg.chat_history[-1]["content"].strip()

try:
    data = json.loads(raw)
except json.JSONDecodeError:
    raise ValueError(f"Assistant returned invalid JSON: {raw}")

print(data["risk_score"])

If you’re using AutoGen’s structured outputs or tools, make the contract explicit. Don’t rely on “please return JSON” and hope the model obeys under load.

Other Possible Causes

1) Mixed message types in group chat

If one agent emits a message shape that another agent can’t consume, the manager may fail while trying to scale messages across participants.

# Problematic: custom content shape sneaks into chat history
groupchat.messages.append({
    "role": "assistant",
    "content": {"risk_score": 7}  # not a string in many flows
})

Fix it by keeping content consistently serializable:

groupchat.messages.append({
    "role": "assistant",
    "content": json.dumps({"risk_score": 7})
})

2) Tool/function output not matching declared schema

When using function calling, the tool result must match what your prompt and downstream code expect. A common failure is returning a plain string where your parser expects an object.

def get_policy_status(policy_id: str):
    return "active"  # too vague if downstream expects structured fields

Use a stable schema:

def get_policy_status(policy_id: str):
    return {
        "policy_id": policy_id,
        "status": "active",
        "effective_date": "2026-01-01"
    }

3) Prompt injection from previous turns

A previous assistant turn may include markdown fences, explanation text, or a half-finished object. When the next step tries to parse it, you get a scaling/parsing failure.

Here is the result:
```json
{"score": 9}

That looks harmless to humans. It breaks strict parsers expecting raw JSON only.

### 4) Model configuration mismatch

If one agent uses a model that supports tool calls and another doesn’t, you can get inconsistent behavior during orchestration.

```python
llm_config = {
    "config_list": [
        {"model": "gpt-4o-mini", "api_key": "..."},
        {"model": "text-davinci-003", "api_key": "..."}  # incompatible with your flow
    ]
}

Keep models aligned across agents in the same workflow unless you’ve tested mixed capability behavior.

How to Debug It

  1. Print the exact raw assistant output

    • Don’t inspect the parsed object first.
    • Log chat_history[-1]["content"] before any json.loads() or schema validation.
  2. Check whether the failure happens before or after tool execution

    • If it fails immediately after an assistant response, it’s usually formatting.
    • If it fails after a tool call, inspect the tool return value and serialization.
  3. Reduce to two agents

    • Strip out GroupChatManager, nested chats, and extra tools.
    • Reproduce with one AssistantAgent and one UserProxyAgent.
  4. Validate against the exact expected schema

    • If you expect:
      {"risk_score": 5, "reason": "..."}
      
      then reject anything else.
    • Add strict checks before passing data into the next agent.

Prevention

  • Make output contracts explicit in prompts and enforce them in code.

    • If you need JSON, validate JSON.
    • If you need fields, validate fields.
  • Keep agent message content consistent.

    • Use strings for chat content unless your flow explicitly supports richer objects.
    • Serialize dictionaries with json.dumps() before storing them in message history.
  • Add guardrails around every parsing boundary.

    • Parse once.
    • Validate once.
    • Fail fast with a useful error message instead of letting AutoGen scale bad content through the workflow.

If you’re seeing output parsing error when scaling, assume the problem is not “AutoGen being flaky.” In most cases it’s a contract mismatch between agents, tools, and parsers. Fix that contract first.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides