LlamaIndex Tutorial (Python): running agents in parallel for advanced developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexrunning-agents-in-parallel-for-advanced-developerspython

This tutorial shows you how to run multiple LlamaIndex agents in parallel from Python, collect their outputs, and combine the results into a single answer. You need this when one agent is not enough: for example, when you want separate agents to research different parts of a problem, compare perspectives, or split work across tools without blocking on a single sequential chain.

What You'll Need

  • Python 3.10+
  • llama-index
  • An OpenAI API key set as OPENAI_API_KEY
  • A project with internet access for model calls
  • Basic familiarity with:
    • AgentWorkflow
    • tools in LlamaIndex
    • async Python with asyncio

Install the package:

pip install llama-index

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

  1. Start by defining a couple of simple tools. In parallel agent setups, tool boundaries matter more than usual because each agent should own a specific slice of the work.
import asyncio
from llama_index.core.tools import FunctionTool

def get_market_summary(region: str) -> str:
    return f"{region}: stable demand, moderate pricing pressure, low churn."

def get_risk_summary(segment: str) -> str:
    return f"{segment}: elevated compliance risk, medium operational risk."

market_tool = FunctionTool.from_defaults(fn=get_market_summary)
risk_tool = FunctionTool.from_defaults(fn=get_risk_summary)
  1. Next, create two specialized agents. Keep each agent narrow so the parallelism is meaningful; one agent should not be able to do everything.
from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o-mini")

market_agent = AgentWorkflow.from_tools_or_functions(
    [market_tool],
    llm=llm,
    system_prompt="You are a market analysis agent. Use only the provided tool.",
)

risk_agent = AgentWorkflow.from_tools_or_functions(
    [risk_tool],
    llm=llm,
    system_prompt="You are a risk analysis agent. Use only the provided tool.",
)
  1. Run both agents concurrently with asyncio.gather. This is the core pattern: each agent gets its own task, and Python waits for both to finish before you merge their outputs.
async def run_agents_parallel():
    market_task = market_agent.run("Analyze EMEA market conditions.")
    risk_task = risk_agent.run("Assess SME portfolio risk.")

    market_result, risk_result = await asyncio.gather(
        market_task,
        risk_task,
    )

    return market_result, risk_result

results = asyncio.run(run_agents_parallel())
print(results[0].response)
print(results[1].response)
  1. Add a coordinator step that synthesizes both outputs into one response. In production, this is usually another LLM call or a deterministic merge layer depending on how strict your workflow needs to be.
from llama_index.core.llms import ChatMessage

async def synthesize_results(market_text: str, risk_text: str) -> str:
    messages = [
        ChatMessage(
            role="system",
            content="Combine the two reports into one concise executive summary.",
        ),
        ChatMessage(
            role="user",
            content=f"Market report:\n{market_text}\n\nRisk report:\n{risk_text}",
        ),
    ]
    response = await llm.astream_chat(messages).__anext__()
    return response.message.content

async def run_end_to_end():
    market_result, risk_result = await run_agents_parallel()
    summary = await synthesize_results(
        market_result.response.response,
        risk_result.response.response,
    )
    return summary

print(asyncio.run(run_end_to_end()))
  1. If you need more than two agents, scale the same pattern with a list of tasks. This is where parallel agents become useful for multi-region research, policy checks, or document triage.
async def run_many_agents():
    tasks = [
        market_agent.run("Analyze EMEA market conditions."),
        risk_agent.run("Assess SME portfolio risk."),
        market_agent.run("Analyze APAC market conditions."),
    ]
    results = await asyncio.gather(*tasks)
    return [r.response.response for r in results]

for item in asyncio.run(run_many_agents()):
    print(item)

Testing It

Run the script and confirm that both agent outputs are returned without waiting for one to finish before starting the other. If you add timestamps around each task, you should see total runtime closer to the slowest individual call than the sum of all calls.

Also verify that each agent stays within its own scope. If the market agent starts answering risk questions or vice versa, tighten the system prompt and reduce tool overlap.

For production-style testing, try one slow tool and one fast tool. The fast result should not be blocked by the slow one until gather() resolves.

Next Steps

  • Add structured outputs with Pydantic models so each agent returns machine-readable fields.
  • Introduce retries and timeouts per task for better failure isolation.
  • Replace the final synthesis step with a router agent that decides whether to merge, escalate, or ask follow-up questions.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides