How to Fix 'tool calling failure when scaling' in LangGraph (Python)

By Cyprian AaronsUpdated 2026-04-21
tool-calling-failure-when-scalinglanggraphpython

When LangGraph throws a tool calling failure when scaling error, it usually means your agent worked in a small test run, then broke once you added parallelism, more nodes, or longer conversations. In practice, this often shows up as tool calls not being routed back into the graph state correctly, or messages getting mutated in a way that only fails under load.

The key point: this is rarely a “LangGraph is broken” issue. It’s usually a state shape, message handling, or concurrency bug that scaling exposed.

The Most Common Cause

The #1 cause is incorrect handling of ToolMessage / AIMessage.tool_calls across graph steps, especially when people manually append messages or return partial state from nodes.

A common broken pattern is to call the model, detect tool calls, but fail to preserve the full message history and tool response contract that LangGraph expects.

Broken patternFixed pattern
Manually appending raw dicts or overwriting messagesReturning proper BaseMessage objects and using add_messages
Dropping the assistant message that contains tool_callsKeeping the assistant message in state until the tool node responds
Returning only the latest message instead of full state updatesReturning incremental updates through LangGraph reducers

Broken code

from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

class State(TypedDict):
    messages: list

def agent_node(state: State):
    # BROKEN: overwrites messages and may lose tool call context
    response = llm.invoke(state["messages"])
    return {"messages": [response.content]}  # wrong type, wrong shape

def tool_node(state: State):
    # BROKEN: assumes tool call exists without preserving assistant msg
    return {"messages": [{"role": "tool", "content": "done"}]}

graph = StateGraph(State)
graph.add_node("agent", agent_node)
graph.add_node("tool", tool_node)
graph.set_entry_point("agent")
graph.add_edge("agent", END)

app = graph.compile()

This kind of code often works in trivial tests and then fails with errors like:

  • langgraph.errors.InvalidUpdateError: Expected dict, got str
  • langchain_core.messages.base.BaseMessage expected
  • Tool call not found in AIMessage
  • tool calling failure when scaling

Fixed code

from typing import Annotated, TypedDict
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, ToolMessage

llm = ChatOpenAI(model="gpt-4o-mini")

class State(TypedDict):
    messages: Annotated[list, add_messages]

def agent_node(state: State):
    response = llm.invoke(state["messages"])
    # Keep the AIMessage intact so tool_calls remain available
    return {"messages": [response]}

def tool_node(state: State):
    last_ai_msg = state["messages"][-1]
    tool_call = last_ai_msg.tool_calls[0]

    result = f"Executed {tool_call['name']} with args {tool_call['args']}"
    return {
        "messages": [
            ToolMessage(
                content=result,
                tool_call_id=tool_call["id"],
            )
        ]
    }

graph = StateGraph(State)
graph.add_node("agent", agent_node)
graph.add_node("tool", tool_node)

graph.set_entry_point("agent")
# route based on whether the last AI message has tool calls
# (pseudo-router omitted for brevity)

app = graph.compile()

The important changes:

  • Use Annotated[list, add_messages] so LangGraph merges messages correctly.
  • Return actual message objects like AIMessage and ToolMessage.
  • Preserve the assistant message that contains tool_calls.
  • Match each ToolMessage.tool_call_id to the original call ID.

Other Possible Causes

1) Tool schema mismatch

If your tool signature doesn’t match what the model emits, LangChain can’t parse arguments cleanly.

# Broken
@tool
def lookup_policy(policy_id: int):  # model sends string IDs often
    ...

# Better
@tool
def lookup_policy(policy_id: str):
    ...

If you’re using Pydantic schemas:

class LookupPolicyInput(BaseModel):
    policy_id: str

@tool(args_schema=LookupPolicyInput)
def lookup_policy(policy_id: str):
    ...

2) Non-deterministic shared state under concurrency

Scaling often means multiple runs or branches hit shared mutable objects.

# Broken: shared global list
shared_messages = []

def node(state):
    shared_messages.extend(state["messages"])

Use per-run state only:

def node(state):
    local_messages = list(state["messages"])

If you’re storing checkpoints, make sure your checkpointer is thread-safe and keyed by unique thread/session IDs.

3) Returning invalid node outputs

LangGraph nodes must return a dict matching the graph state contract. Returning strings or nested junk causes failures that look unrelated at first.

# Broken
def router(state):
    return "tools"

# Fixed
def router(state):
    return {"next": "tools"}

You’ll often see errors like:

  • langgraph.errors.InvalidUpdateError
  • Expected dict at path ...
  • Invalid concurrent update

4) Wrong edge routing after a tool call

If your conditional edge doesn’t route back to the agent after tools run, the graph can stall or recurse incorrectly.

# Broken routing idea:
graph.add_conditional_edges("agent", should_use_tool, {"tools": "tools"})
graph.add_edge("tools", END)  # ends too early

# Better:
graph.add_conditional_edges("agent", should_use_tool, {"tools": "tools", END: END})
graph.add_edge("tools", "agent")

That loop back matters. The assistant needs to see the tool result before it generates the final answer.

How to Debug It

  1. Inspect the last AI message

    • Print state["messages"][-1].
    • Confirm it is an AIMessage, not a string or dict.
    • Check whether .tool_calls exists and has valid IDs.
  2. Verify your reducer

    • If you’re managing messages manually, switch to:
      messages: Annotated[list, add_messages]
      
    • Without this, concurrent updates can overwrite each other.
  3. Log every node output

    • Each node should return a dict.
    • Log keys and types:
      print(type(output), output.keys())
      
    • Look for accidental returns like "done" or [message].
  4. Run with one thread first

    • Disable parallel branches and test a single session.
    • If it works serially but fails under load, suspect shared state or checkpoint collisions.
    • Make sure every run has a unique session/thread ID.

Prevention

  • Use LangGraph’s message reducer from day one:

    • Annotated[list, add_messages]
    • Don’t hand-roll message merging unless you really need to.
  • Keep tools strict:

    • Use typed args schemas.
    • Prefer strings for external IDs unless you control formatting end-to-end.
  • Treat graph nodes as pure functions:

    • Input state in.
    • Valid partial state out.
    • No mutation of globals, no hidden side effects.

If you’re seeing this error only after scaling up workers or traffic, start with message integrity first. In LangGraph, most “scaling” failures are really state-contract failures that concurrency made visible.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides