LlamaIndex Tutorial (Python): running agents in parallel for beginners

By Cyprian AaronsUpdated 2026-04-21

llamaindexrunning-agents-in-parallel-for-beginnerspython

This tutorial shows you how to run multiple LlamaIndex agents in parallel from Python and collect their results with asyncio. You need this when one agent is enough to fail or bottleneck the workflow, but separate tasks can be handled independently, like pulling data from different sources or asking specialized agents to answer different parts of a request.

What You'll Need

•Python 3.10+
•llama-index
•openai API key
•Basic familiarity with LlamaIndex AgentRunner / chat agents
•A terminal and a virtual environment

Install the packages:

pip install llama-index openai

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

•Start by creating two agents that do different jobs. For beginners, keep them simple: one agent summarizes text, the other extracts action items.

import asyncio
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o-mini")

summarizer = ReActAgent.from_tools(
    [],
    llm=llm,
    verbose=False,
)

extractor = ReActAgent.from_tools(
    [],
    llm=llm,
    verbose=False,
)

•Give each agent its own prompt so they behave differently. The point of parallel execution is not just speed; it is isolating independent tasks so each agent can work without waiting on the other.

SUMMARY_PROMPT = """
Summarize the following meeting notes in 3 bullet points.
Keep it concise and factual.
"""

ACTION_PROMPT = """
Extract action items from the following meeting notes.
Return a numbered list of owners and tasks.
"""

MEETING_NOTES = """
Team discussed Q2 onboarding issues.
Alice will update the signup flow by Friday.
Bob will review analytics tracking.
The support team reported fewer password reset tickets.
"""

•Run both agents concurrently with asyncio.gather. This is the main pattern: create async tasks for each independent agent call, then await them together.

async def run_parallel_agents():
    summary_task = summarizer.acall(SUMMARY_PROMPT + "\n\n" + MEETING_NOTES)
    actions_task = extractor.acall(ACTION_PROMPT + "\n\n" + MEETING_NOTES)

    summary_result, actions_result = await asyncio.gather(
        summary_task,
        actions_task,
    )

    return summary_result, actions_result


if __name__ == "__main__":
    summary_result, actions_result = asyncio.run(run_parallel_agents())
    print("=== SUMMARY ===")
    print(summary_result.response)
    print("\n=== ACTION ITEMS ===")
    print(actions_result.response)

•If you want cleaner production code, wrap each agent call in a small helper function. That makes it easier to add timeouts, retries, or logging later without changing your orchestration logic.

async def run_agent(agent, prompt: str):
    result = await agent.acall(prompt)
    return result.response


async def run_parallel_agents_clean():
    summary_prompt = SUMMARY_PROMPT + "\n\n" + MEETING_NOTES
    action_prompt = ACTION_PROMPT + "\n\n" + MEETING_NOTES

    summary_text, action_text = await asyncio.gather(
        run_agent(summarizer, summary_prompt),
        run_agent(extractor, action_prompt),
    )

    return {
        "summary": summary_text,
        "actions": action_text,
    }

•Add basic error handling if one agent can fail without killing the whole workflow. In real systems, partial results are often better than no results.

async def safe_run_agent(agent, prompt: str):
    try:
        result = await agent.acall(prompt)
        return {"ok": True, "response": result.response}
    except Exception as exc:
        return {"ok": False, "error": str(exc)}


async def run_with_partial_failure():
    results = await asyncio.gather(
        safe_run_agent(summarizer, SUMMARY_PROMPT + "\n\n" + MEETING_NOTES),
        safe_run_agent(extractor, ACTION_PROMPT + "\n\n" + MEETING_NOTES),
    )
    return results

Testing It

Run the script from your terminal and confirm that both outputs appear in one execution. If everything is wired correctly, you should see a short summary and a numbered list of action items returned independently.

To verify parallelism, add a temporary print() before each acall() and compare timestamps or use longer prompts so you can see both requests in flight at once. If one call fails, test the safe_run_agent() version and confirm that you still get the successful result back.

For a stronger check, swap one prompt for a slower task and make sure total runtime is closer to the slower request than the sum of both requests.

Next Steps

•Add tool-enabled agents and run different tools in parallel for retrieval-heavy workflows
•Learn asyncio.wait() and timeout handling for better control over long-running agent calls
•Move from simple prompts to structured outputs with Pydantic models for safer downstream processing

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit