AutoGen Tutorial (Python): running agents in parallel for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
autogenrunning-agents-in-parallel-for-intermediate-developerspython

This tutorial shows you how to run multiple AutoGen agents in parallel from Python, collect their outputs, and merge the results into one final answer. You need this when one agent is too slow, or when you want independent reasoning paths for tasks like research, policy checks, or comparing multiple responses before deciding.

What You'll Need

  • Python 3.10+
  • pyautogen installed
  • An OpenAI-compatible API key
  • A model name that works with your provider
  • Basic familiarity with AssistantAgent and UserProxyAgent
  • A terminal and a place to run Python scripts

Step-by-Step

  1. Start by installing AutoGen and setting your API key. Keep the model configuration simple so the example stays portable across OpenAI-compatible providers.
pip install pyautogen
export OPENAI_API_KEY="your_api_key_here"
  1. Define a small helper that creates a fresh assistant agent per task. The important detail here is isolation: each agent gets its own conversation, so parallel work does not bleed state across runs.
import os
from autogen import AssistantAgent

llm_config = {
    "config_list": [
        {
            "model": "gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
        }
    ],
    "temperature": 0,
}

def make_agent(name: str) -> AssistantAgent:
    return AssistantAgent(
        name=name,
        llm_config=llm_config,
        system_message="You are a concise assistant that answers directly.",
    )
  1. Run the agents in parallel with ThreadPoolExecutor. AutoGen agent calls are blocking, so threads are the simplest way to get concurrent execution without rewriting your workflow around async plumbing.
from concurrent.futures import ThreadPoolExecutor, as_completed

tasks = [
    ("agent_a", "List three risks in launching a new insurance product."),
    ("agent_b", "List three controls for reducing fraud in claims processing."),
    ("agent_c", "List three compliance checks for customer onboarding."),
]

def run_task(name: str, prompt: str) -> str:
    agent = make_agent(name)
    result = agent.generate_reply(messages=[{"role": "user", "content": prompt}])
    return f"{name}: {result}"

results = []
with ThreadPoolExecutor(max_workers=3) as executor:
    futures = [executor.submit(run_task, name, prompt) for name, prompt in tasks]
    for future in as_completed(futures):
        results.append(future.result())

for item in results:
    print(item)
  1. If you want a final synthesis step, hand the collected outputs to one more agent. This gives you the common production pattern: parallel specialists first, then a single consolidator that produces the answer you actually return to the caller.
from autogen import UserProxyAgent

aggregator = AssistantAgent(
    name="aggregator",
    llm_config=llm_config,
    system_message="You combine multiple answers into one structured summary.",
)

summary_input = "\n\n".join(results)
final_prompt = (
    "Combine these parallel findings into a single response with headings "
    "for risks, controls, and compliance checks:\n\n"
    f"{summary_input}"
)

final_answer = aggregator.generate_reply(messages=[{"role": "user", "content": final_prompt}])
print("\nFINAL ANSWER:\n")
print(final_answer)
  1. If you need tighter control over failures, wrap each task in try/except and return an error payload instead of crashing the whole batch. In production systems, one bad agent call should not take down all concurrent work.
def safe_run_task(name: str, prompt: str) -> str:
    try:
        agent = make_agent(name)
        result = agent.generate_reply(messages=[{"role": "user", "content": prompt}])
        return f"{name}: {result}"
    except Exception as e:
        return f"{name}: ERROR - {type(e).__name__}: {e}"

Testing It

Run the script once and confirm that all three agents return independently generated answers. Then check that the final aggregator output reflects all three inputs instead of only one branch.

If you see timeouts or rate-limit errors, reduce max_workers to 2 or lower your request frequency. Also verify that your API key is available in the environment where Python is running, not just in your shell profile.

A good sanity check is to change one prompt at a time and confirm only that branch changes. That tells you your parallel execution is isolated correctly and you are not accidentally sharing state between agents.

Next Steps

  • Add retries with exponential backoff around each worker function
  • Replace threads with asyncio if your provider and AutoGen setup support async calls cleanly
  • Add structured outputs so each agent returns JSON instead of free text

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides