AutoGen Tutorial (Python): running agents in parallel for advanced developers
This tutorial shows you how to run multiple AutoGen agents in parallel with Python, collect their outputs, and merge the results into a single decision flow. You need this when one agent is not enough: for example, when you want independent analysis from research, compliance, and implementation agents before your orchestrator picks a final answer.
What You'll Need
- •Python 3.10+
- •
pyautogeninstalled - •An OpenAI-compatible API key
- •A model name that works with your provider, such as
gpt-4o-mini - •Basic familiarity with
AssistantAgentandUserProxyAgent - •A terminal where you can set environment variables
Install the package:
pip install pyautogen
Set your API key:
export OPENAI_API_KEY="your-key-here"
Step-by-Step
- •Start by creating a shared LLM configuration and a small helper that builds agents with consistent settings. In parallel workflows, consistency matters more than clever prompts because you want comparable outputs across agents.
import os
from autogen import AssistantAgent
llm_config = {
"model": "gpt-4o-mini",
"api_key": os.environ["OPENAI_API_KEY"],
"temperature": 0.2,
}
def make_agent(name: str, system_message: str) -> AssistantAgent:
return AssistantAgent(
name=name,
llm_config=llm_config,
system_message=system_message,
)
- •Define several specialized agents and give each one a narrow job. Parallelism works best when each agent has an isolated responsibility instead of all of them trying to do the same thing.
research_agent = make_agent(
"research_agent",
"You are a research analyst. Return concise factual findings only.",
)
risk_agent = make_agent(
"risk_agent",
"You are a risk analyst for banking software. Identify operational and compliance risks.",
)
implementation_agent = make_agent(
"implementation_agent",
"You are a senior Python engineer. Suggest an implementation approach and edge cases.",
)
- •Use
ThreadPoolExecutorto run the agents at the same time. AutoGen calls are network-bound, so threads are the simplest production-friendly way to parallelize them in Python.
from concurrent.futures import ThreadPoolExecutor, as_completed
task = (
"Design an agent workflow for reviewing loan applications. "
"Focus on speed, auditability, and failure handling."
)
agents = [research_agent, risk_agent, implementation_agent]
def run_agent(agent):
return agent.generate_reply(messages=[{"role": "user", "content": task}])
results = {}
with ThreadPoolExecutor(max_workers=len(agents)) as executor:
future_map = {executor.submit(run_agent, agent): agent.name for agent in agents}
for future in as_completed(future_map):
name = future_map[future]
results[name] = future.result()
for name, output in results.items():
print(f"\n=== {name} ===\n{output}")
- •Add an orchestrator step that merges the parallel outputs into one final answer. This is the pattern you want in real systems: parallel specialists first, then one coordinator that resolves conflicts and produces the final response.
orchestrator = make_agent(
"orchestrator",
"You combine specialist outputs into one practical recommendation for production use.",
)
summary_prompt = f"""
Task:
{task}
Research output:
{results['research_agent']}
Risk output:
{results['risk_agent']}
Implementation output:
{results['implementation_agent']}
Write a final recommendation with sections:
- Recommended approach
- Main risks
- Implementation notes
"""
final_answer = orchestrator.generate_reply(messages=[{"role": "user", "content": summary_prompt}])
print("\n=== FINAL ANSWER ===\n")
print(final_answer)
- •If you need stronger control over failures, wrap each agent call with timeout and exception handling. In production banking workflows, one slow specialist should not block the whole pipeline.
from concurrent.futures import TimeoutError
def safe_run(agent, prompt: str):
try:
return agent.generate_reply(messages=[{"role": "user", "content": prompt}])
except Exception as exc:
return f"ERROR from {agent.name}: {exc}"
with ThreadPoolExecutor(max_workers=3) as executor:
futures = [executor.submit(safe_run, agent, task) for agent in agents]
outputs = []
for future in futures:
try:
outputs.append(future.result(timeout=60))
except TimeoutError:
outputs.append("ERROR: timed out")
print(outputs)
Testing It
Run the script once with a simple task and confirm that all three specialist agents return different perspectives. Then verify that the orchestrator produces a combined response using those outputs rather than generating a fresh answer from scratch.
If one agent is slow or fails, check that the others still complete and their results are preserved in results. For deeper validation, log timestamps before and after each future completes so you can confirm calls are actually overlapping.
A good test is to change the task to something domain-specific like fraud review or claims triage and see whether each agent stays within its assigned role. If they start blending responsibilities too much, tighten the system messages.
Next Steps
- •Add structured outputs with Pydantic so each parallel agent returns JSON instead of free text
- •Introduce cancellation logic so stale branches stop when one high-confidence branch wins
- •Move from threads to async execution if your AutoGen stack and model provider support it
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit