LlamaIndex Tutorial (Python): building custom tools for intermediate developers
This tutorial shows you how to build custom tools in LlamaIndex with Python, wire them into an agent, and test them against real inputs. You need this when the built-in tools are too generic and you want deterministic behavior around APIs, internal services, or business rules.
What You'll Need
- •Python 3.10+
- •
llama-index - •
llama-index-llms-openai - •An OpenAI API key set as
OPENAI_API_KEY - •Basic familiarity with LlamaIndex agents and tool calling
- •A terminal and a clean virtual environment
Install the packages:
pip install llama-index llama-index-llms-openai
Step-by-Step
- •Start by defining a real tool function. In production, this is usually a wrapper around a service call, lookup table, or policy check. Keep it deterministic and narrow in scope.
from typing import Literal
def calculate_risk_band(age: int, claims_last_12_months: int) -> str:
if age < 25:
base = "high"
elif age < 45:
base = "medium"
else:
base = "low"
if claims_last_12_months >= 3:
return "high"
if claims_last_12_months == 2 and base != "high":
return "medium"
return base
print(calculate_risk_band(31, 1))
- •Wrap that function in a LlamaIndex tool using
FunctionTool. This gives the agent a typed interface and a description it can use when deciding whether to call the tool.
from llama_index.core.tools import FunctionTool
risk_tool = FunctionTool.from_defaults(
fn=calculate_risk_band,
name="calculate_risk_band",
description="Calculate an insurance risk band from age and recent claim count."
)
print(risk_tool.metadata.name)
print(risk_tool.metadata.description)
- •Add a second tool so the agent has something meaningful to choose between. In real systems, you usually want multiple focused tools instead of one giant utility function.
def summarize_policy_status(active: bool, premium_overdue_days: int) -> str:
if not active:
return "policy_inactive"
if premium_overdue_days > 30:
return "policy_at_risk"
if premium_overdue_days > 0:
return "payment_pending"
return "policy_in_good_standing"
status_tool = FunctionTool.from_defaults(
fn=summarize_policy_status,
name="summarize_policy_status",
description="Summarize policy status from activity state and overdue days."
)
print(status_tool.call(active=True, premium_overdue_days=15))
- •Create an agent that can use both tools. The key point is that the model should decide when to call your tools instead of hallucinating an answer.
import os
from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"])
agent = AgentWorkflow.from_tools_or_functions(
[risk_tool, status_tool],
llm=llm,
system_prompt=(
"You are an insurance operations assistant. "
"Use tools for any calculation or policy-status decision."
),
)
response = agent.run_sync("A 23-year-old customer has 2 claims in the last 12 months. What is the risk band?")
print(response)
- •Make the output more useful by combining tool results in one request. This is where custom tools start paying off: you get consistent business logic plus natural-language orchestration on top.
result = agent.run_sync(
"For a 41-year-old customer with 0 claims last year and a policy overdue by 45 days, "
"tell me the risk band and policy status."
)
print(result)
- •If you need stricter control, keep your tools pure and validate inputs before exposing them to the agent. That makes failures obvious and keeps production behavior predictable.
def calculate_safe_risk_band(age: int, claims_last_12_months: int) -> str:
if age <= 0 or claims_last_12_months < 0:
raise ValueError("Invalid input values")
return calculate_risk_band(age, claims_last_12_months)
safe_tool = FunctionTool.from_defaults(
fn=calculate_safe_risk_band,
name="calculate_safe_risk_band",
description="Validate inputs and calculate an insurance risk band."
)
print(safe_tool.call(age=34, claims_last_12_months=1))
Testing It
Run the script with a few different prompts and check whether the agent actually calls the right tool for each request. You want to see stable outputs for the same inputs, especially when you repeat edge cases like negative values, inactive policies, or multiple claims.
A good test is to compare direct function output with agent output for the same input pair. If they diverge, your prompt is too loose or your tool description is too vague.
Also inspect failures intentionally by passing invalid values into calculate_safe_risk_band. In production, that’s how you catch bad assumptions before they reach external systems.
Next Steps
- •Add async tools with
async deffor API-backed workflows. - •Build tools around database queries using parameterized SQL.
- •Add structured outputs with Pydantic models so downstream code can trust the result shape
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit