How to Fix 'tool calling failure in production' in LlamaIndex (Python)
Tool calling failures in production usually mean your LLM returned a response that LlamaIndex could not parse into a valid tool call. In practice, this shows up when the model ignores the tool schema, returns malformed JSON, or you wired up an agent with the wrong model/settings combination.
The error often appears after deployment because local tests use small prompts and clean inputs, while production traffic pushes the model into edge cases. The common symptom is something like ValueError: Could not parse tool call or ToolCallParseError coming out of llama_index.core.agent.
The Most Common Cause
The #1 cause is using a model that does not reliably support structured tool calling, or using it without enabling the right agent/tool configuration.
In LlamaIndex, this usually happens when you expect FunctionAgent, OpenAIAgent, or another tool-using agent to receive valid function-call style output, but the underlying LLM only returns plain text.
Broken vs fixed pattern
| Broken | Fixed |
|---|---|
| Agent expects a tool call, but the model is not configured for it | Use a tool-capable model and explicit tool settings |
| Prompt asks for JSON, but agent parsing expects function calls | Let LlamaIndex manage tool invocation through the agent API |
# BROKEN: model may answer in plain text instead of emitting a valid tool call
from llama_index.core.agent import FunctionAgent
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-3.5-turbo") # often unreliable for strict tool calling in prod
agent = FunctionAgent.from_tools(
tools=[weather_tool],
llm=llm,
)
response = agent.chat("What's the weather in London?")
print(response)
# FIXED: use a tool-capable model and keep the agent/tool contract intact
from llama_index.core.agent import FunctionAgent
from llama_index.llms.openai import OpenAI
llm = OpenAI(
model="gpt-4o-mini",
temperature=0,
)
agent = FunctionAgent.from_tools(
tools=[weather_tool],
llm=llm,
)
response = agent.chat("What's the weather in London?")
print(response)
If you are using OpenAI-compatible providers, make sure they actually support tool/function calling in the same format LlamaIndex expects. A lot of “works locally, fails in prod” cases are just provider mismatch.
Other Possible Causes
1) Your tool schema is invalid or too loose
If your function signature uses unsupported types or vague descriptions, the model may emit arguments that fail validation.
# BAD: ambiguous schema
def lookup_policy(policy_id: str):
"""Get policy details."""
# BETTER: explicit args and return shape
from pydantic import BaseModel, Field
class PolicyArgs(BaseModel):
policy_id: str = Field(..., description="Exact policy identifier like POL-12345")
def lookup_policy(policy_id: str) -> str:
return f"Policy {policy_id}"
2) You are wrapping raw prompt text around an agent call
If you manually instruct the model to “call a function” in plain English, but still rely on LlamaIndex parsing internals, you can get:
- •
Could not parse tool call - •
Expected function call arguments - •
ToolCallParseError
# BAD: manual prompting fights the agent framework
response = llm.complete(
"Use lookup_policy(policy_id='POL-123') and then answer."
)
Use the actual agent API instead:
# GOOD: let the agent select and invoke tools
response = agent.chat("Get me details for policy POL-123")
3) Streaming is enabled but your downstream parser assumes final output
Some providers emit partial chunks that look like broken JSON until the full message arrives. If your code inspects intermediate tokens as if they were complete tool calls, parsing fails.
# Example config to inspect
settings = {
"streaming": True,
}
If your integration layer parses streamed chunks, test with streaming off first:
llm = OpenAI(model="gpt-4o-mini", temperature=0, streaming=False)
4) Context window truncation is cutting off the tool arguments
Large chat history can truncate the assistant’s function-call payload. In production this happens when long system prompts or conversation memory push the request over token limits.
Watch for errors like:
- •malformed JSON arguments
- •missing closing braces
- •empty
tool_callscontent
Fix by reducing history or increasing context capacity:
llm = OpenAI(
model="gpt-4o-mini",
temperature=0,
max_tokens=512,
)
Also trim memory before passing it into the agent.
How to Debug It
- •
Check whether it is a parse problem or an actual provider failure
Look at the exact exception. If you seeToolCallParseError,ValueError: Could not parse tool call, orFailed to extract function call, the model returned something LlamaIndex could not interpret. - •
Log raw model output before LlamaIndex processes it
Compare what came back from the provider with what your code expected. If you see plain text like “Sure, I can help with that” instead of structured arguments, your issue is model/tool support. - •
Disable streaming and reduce prompt size
Run one request with:- •
streaming=False - •minimal system prompt
- •no conversation history
If it starts working, you likely have truncation or chunk-parsing issues.
- •
- •
Swap to a known-good model and minimal tool set
Test with one simple function and one strong tool-calling model. If that works, add tools back one by one until it breaks.
Prevention
- •Use models that explicitly support structured tool calling and keep
temperature=0for agents that must be deterministic. - •Keep tools small and typed. Prefer clear parameter names and Pydantic schemas over loose dicts.
- •Add an integration test that asserts your agent can successfully produce and execute a real tool call before deploying.
If you are still seeing tool calling failure in production, start by reducing everything to one agent, one tool, one request. That isolates whether the problem is your model choice, schema design, or runtime configuration.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit