LlamaIndex Tutorial (Python): handling async tools for advanced developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexhandling-async-tools-for-advanced-developerspython

This tutorial shows how to wire async tools into a LlamaIndex agent so you can call I/O-bound functions without blocking the event loop. You need this when your tools hit databases, HTTP APIs, or internal services and you want the agent to stay responsive under concurrent load.

What You'll Need

  • Python 3.10+
  • llama-index
  • llama-index-llms-openai
  • openai API key set as OPENAI_API_KEY
  • A terminal with pip
  • Basic familiarity with LlamaIndex agents and tools
  • An async-capable environment like asyncio.run(...)

Step-by-Step

  1. Install the packages and set up your environment.
    Keep this minimal: one LLM package, one core package, and your API key in the environment.
pip install llama-index llama-index-llms-openai openai
export OPENAI_API_KEY="your-key-here"
  1. Create an async tool that simulates an external call.
    The important part is that the tool function is declared with async def, because LlamaIndex can await it instead of blocking.
import asyncio
from typing import Annotated

from llama_index.core.tools import FunctionTool

async def fetch_account_balance(account_id: str) -> str:
    await asyncio.sleep(1)
    return f"Account {account_id} balance is $12,450.33"

balance_tool = FunctionTool.from_defaults(
    async_fn=fetch_account_balance,
    name="fetch_account_balance",
    description="Fetches the current balance for a bank account.",
)
  1. Build an agent that can call the async tool.
    Use an OpenAI-backed chat model and pass the tool into a ReAct-style agent. This is the cleanest path when you want tool calling plus natural language reasoning.
import asyncio

from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o-mini")
agent = ReActAgent.from_tools(
    tools=[balance_tool],
    llm=llm,
    verbose=True,
)
  1. Call the agent from an async entrypoint.
    Use achat() so the whole request path stays asynchronous end-to-end. If you call sync methods here, you lose most of the benefit.
import asyncio

async def main() -> None:
    response = await agent.achat(
        "What is the balance for account 123456?"
    )
    print(response)

if __name__ == "__main__":
    asyncio.run(main())
  1. Add a second async tool and run both through the same agent.
    This is where async starts paying off: multiple I/O-bound tools can be orchestrated without turning your app into a thread pool mess.
import asyncio
from llama_index.core.tools import FunctionTool

async def fetch_recent_transactions(account_id: str) -> str:
    await asyncio.sleep(1)
    return (
        f"Recent transactions for {account_id}: "
        "POS -54.20, ACH +2,000.00, ATM -100.00"
    )

transactions_tool = FunctionTool.from_defaults(
    async_fn=fetch_recent_transactions,
    name="fetch_recent_transactions",
    description="Fetches recent transactions for a bank account.",
)

agent = ReActAgent.from_tools(
    tools=[balance_tool, transactions_tool],
    llm=llm,
    verbose=True,
)
  1. Wrap multiple agent calls with asyncio.gather when you need concurrency.
    This pattern matters in production workflows where you fan out to several customer requests or enrich multiple records at once.
import asyncio

async def run_many() -> None:
    prompts = [
        "Get the balance for account 111111.",
        "Get recent transactions for account 222222.",
        "Get both balance and recent transactions for account 333333.",
    ]
    results = await asyncio.gather(*(agent.achat(p) for p in prompts))
    for i, result in enumerate(results, start=1):
        print(f"\nResult {i}:\n{result}")

if __name__ == "__main__":
    asyncio.run(run_many())

Testing It

Run the script and watch for two things: first, the tool should be invoked without throwing event-loop errors; second, the response should include data returned by your async function rather than a generic fallback. If you enabled verbose=True, you should see the agent deciding to call your tool before returning an answer.

To test concurrency, use the run_many() example and confirm that total runtime is closer to one tool latency than three sequential latencies. If each tool sleeps for one second and all requests finish in roughly one to two seconds total, your async path is working correctly.

If something fails, check these first:

  • OPENAI_API_KEY is set in the shell running Python
  • You are calling await agent.achat(...), not agent.chat(...)
  • Your tool uses async def, not a regular function
  • The installed package versions are compatible with your Python version

Next Steps

  • Add real HTTP tools using httpx.AsyncClient instead of simulated sleeps.
  • Combine async tools with structured outputs so downstream services get typed responses.
  • Move from single-agent orchestration to workflow patterns when you need retries, branching, or parallel tool execution

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides