How to Fix 'chain execution stuck in production' in AutoGen (Python)
What this error means
When AutoGen says your chain execution is stuck in production, it usually means the agent conversation never reaches a termination condition. In Python, that often shows up as an endless loop of AssistantAgent / UserProxyAgent turns, a task that never returns, or a worker process sitting idle while the chat keeps generating messages.
In practice, this happens when your termination logic is wrong, tool calls never complete, or you’ve built an agent loop with no hard stop.
The Most Common Cause
The #1 cause is a missing or incorrect termination condition in your AutoGen chat loop.
A common broken pattern is letting AssistantAgent keep replying forever because is_termination_msg never matches the actual final message. Another variant is using max_consecutive_auto_reply=None and assuming the model will “just stop”.
Broken vs fixed
| Broken pattern | Fixed pattern |
|---|---|
| No real termination signal | Explicit termination on "TERMINATE" |
| Infinite auto-reply loop | Bounded auto-reply count |
| Relies on model behavior | Relies on code-level control |
# BROKEN
from autogen import AssistantAgent, UserProxyAgent
assistant = AssistantAgent(
name="assistant",
llm_config={"config_list": [{"model": "gpt-4o-mini", "api_key": "YOUR_KEY"}]},
)
user_proxy = UserProxyAgent(
name="user_proxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=None, # bad: no bound
)
user_proxy.initiate_chat(
assistant,
message="Write a reconciliation report."
)
# FIXED
from autogen import AssistantAgent, UserProxyAgent
assistant = AssistantAgent(
name="assistant",
llm_config={"config_list": [{"model": "gpt-4o-mini", "api_key": "YOUR_KEY"}]},
)
def is_termination_msg(msg):
return msg.get("content", "").strip().endswith("TERMINATE")
user_proxy = UserProxyAgent(
name="user_proxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=10,
is_termination_msg=is_termination_msg,
)
user_proxy.initiate_chat(
assistant,
message="Write a reconciliation report. End with TERMINATE."
)
If you are using GroupChat, the same issue appears as a manager that never selects a terminating speaker. You’ll see the chain keep cycling through agents until your process times out.
Other Possible Causes
1) Tool execution hangs
If you registered tools with register_function() or AssistantAgent function calling, one slow or blocked tool can freeze the whole chain.
def fetch_policy_data(policy_id: str):
# bad: no timeout around network call
return requests.get(f"https://internal-api/policies/{policy_id}").json()
Fix it by adding timeouts and catching failures:
def fetch_policy_data(policy_id: str):
resp = requests.get(
f"https://internal-api/policies/{policy_id}",
timeout=10,
)
resp.raise_for_status()
return resp.json()
2) The model keeps calling tools in a loop
This happens when the assistant receives tool output but never gets enough context to stop. You’ll see repeated tool calls in the logs and messages like:
- •
function_call - •
tool_calls - •repeated assistant replies with no final answer
Use a strict instruction in your system prompt:
system_message = """
You may call tools if needed.
After you have the result, provide a final answer and end with TERMINATE.
Do not call tools again once you have sufficient information.
"""
3) Misconfigured human_input_mode
If you set human_input_mode="ALWAYS" in production without an interactive console, the agent can appear stuck waiting for input.
# problematic in headless services
user_proxy = UserProxyAgent(
name="user_proxy",
human_input_mode="ALWAYS",
)
Use "NEVER" for automation:
user_proxy = UserProxyAgent(
name="user_proxy",
human_input_mode="NEVER",
)
4) Context window overflow or runaway history
Long-running chats can bloat message history until the model starts behaving unpredictably. In AutoGen, that often looks like repetitive output or agents ignoring termination instructions.
Trim history or restart the conversation between tasks:
# example pattern: reset between jobs
assistant.reset()
user_proxy.reset()
If you are using a custom memory layer, make sure old tool outputs are not being replayed into every new turn.
How to Debug It
- •
Check whether termination ever fires
- •Log every message passed into
is_termination_msg. - •Confirm the final assistant message actually matches your rule.
- •If you expect
"TERMINATE"but the model says"Done.", your loop will never stop.
- •Log every message passed into
- •
Turn on verbose AutoGen logging
- •Inspect speaker selection, tool calls, and reply generation.
- •Look for repeated patterns like:
- •assistant → tool → assistant → tool
- •no final assistant response after tool output
- •
Isolate tools from chat logic
- •Temporarily disable all registered functions.
- •If the chain completes without tools, the hang is in your tool layer.
- •If it still hangs, focus on termination and speaker selection.
- •
Add hard limits
- •Set
max_consecutive_auto_reply. - •Add request timeouts to every external call.
- •Put a wall-clock timeout around the whole job in your worker process.
- •Set
Example:
import signal
def handler(signum, frame):
raise TimeoutError("chat timed out")
signal.signal(signal.SIGALRM, handler)
signal.alarm(120)
try:
user_proxy.initiate_chat(assistant, message="Run the workflow.")
finally:
signal.alarm(0)
Prevention
- •Always define an explicit termination token like
TERMINATEand enforce it in code. - •Put timeouts on every external dependency: HTTP calls, database queries, file I/O, and tool execution.
- •Cap conversation length with
max_consecutive_auto_replyand reset agents between jobs. - •In production, treat every AutoGen chain like a distributed workflow: bounded retries, bounded runtime, bounded memory.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit