LlamaIndex Tutorial (Python): running agents in parallel for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexrunning-agents-in-parallel-for-intermediate-developerspython

This tutorial shows you how to run multiple LlamaIndex agents in parallel from Python, then combine their outputs into one result. You need this when one agent alone is too slow, or when you want separate agents to handle different parts of a task at the same time.

What You'll Need

  • Python 3.10+
  • llama-index
  • An OpenAI API key
  • A shell with environment variables support
  • Basic familiarity with LlamaIndex ReActAgent and QueryEngineTool

Install the package:

pip install llama-index

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

  1. First, set up a simple index and turn it into a query tool. I’m using a small in-memory document set so the example is runnable without extra infrastructure.
from llama_index.core import VectorStoreIndex, Document
from llama_index.core.tools import QueryEngineTool, ToolMetadata

docs = [
    Document(text="LlamaIndex is a framework for building LLM applications over data."),
    Document(text="Parallel execution is useful when multiple independent subtasks can run at once."),
]

index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()

research_tool = QueryEngineTool(
    query_engine=query_engine,
    metadata=ToolMetadata(
        name="research_tool",
        description="Answers questions about LlamaIndex and parallel execution."
    ),
)
  1. Next, create two agents that can work independently. In production, these would usually be specialized agents with different tools or prompts, but here they share the same tool so the pattern stays easy to follow.
import asyncio
from llama_index.core.agent import ReActAgent

agent_1 = ReActAgent.from_tools(
    [research_tool],
    verbose=False,
    system_prompt="You are Agent 1. Focus on LlamaIndex concepts."
)

agent_2 = ReActAgent.from_tools(
    [research_tool],
    verbose=False,
    system_prompt="You are Agent 2. Focus on concurrency and orchestration."
)
  1. Now wrap each agent call in an async function. The key idea is that each agent runs as an independent coroutine, which lets asyncio.gather() execute them concurrently.
async def run_agent(agent, prompt: str) -> str:
    response = await agent.aquery(prompt)
    return str(response)

async def run_parallel():
    prompts = [
        "Explain what LlamaIndex is in one paragraph.",
        "Explain why running agents in parallel can reduce latency.",
    ]

    results = await asyncio.gather(
        run_agent(agent_1, prompts[0]),
        run_agent(agent_2, prompts[1]),
    )

    return results
  1. Add a simple orchestrator that merges both outputs into one final answer. This is where you turn parallel work into something useful for the caller.
async def main():
    results = await run_parallel()

    combined = {
        "agent_1": results[0],
        "agent_2": results[1],
    }

    print("=== Agent 1 ===")
    print(combined["agent_1"])
    print("\n=== Agent 2 ===")
    print(combined["agent_2"])

if __name__ == "__main__":
    asyncio.run(main())
  1. If you want a cleaner production pattern, add timeout handling around each task. That way one slow agent does not block the entire request path.
async def run_agent_with_timeout(agent, prompt: str, timeout_seconds: int = 20) -> str:
    try:
        response = await asyncio.wait_for(agent.aquery(prompt), timeout=timeout_seconds)
        return str(response)
    except asyncio.TimeoutError:
        return f"Timed out after {timeout_seconds} seconds."

async def run_parallel_safe():
    return await asyncio.gather(
        run_agent_with_timeout(agent_1, "Summarize LlamaIndex."),
        run_agent_with_timeout(agent_2, "Summarize parallel execution patterns."),
    )

Testing It

Run the script and confirm that both agent responses print out under separate headings. If your API key is set correctly, you should see two distinct answers returned from the same process without waiting for one to finish before starting the other.

To verify parallelism more directly, add timestamps before and after each aquery() call and compare them to sequential execution. In real workloads, especially when each agent calls tools or remote models, this pattern cuts wall-clock time significantly.

If you get authentication errors, check that OPENAI_API_KEY is exported in the same terminal session where you launch Python. If you get event loop errors in notebooks, use await main() directly instead of asyncio.run(main()).

Next Steps

  • Add different tools per agent so each one owns a specific domain or data source.
  • Introduce a final “judge” step that merges or ranks agent outputs.
  • Explore AsyncBaseTool patterns for more complex concurrent tool execution inside an agent flow.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides