How to Fix 'chain execution stuck in production' in CrewAI (TypeScript)
When CrewAI says chain execution stuck in production, it usually means the agent pipeline never reached a terminal state. In practice, that shows up as a run that hangs after a tool call, loops forever on retries, or waits on a promise that never resolves.
In TypeScript projects, this is usually not a “CrewAI bug” in the abstract. It’s almost always a bad tool implementation, an unresolved async boundary, or a task graph that can’t complete.
The Most Common Cause
The #1 cause is an async tool that never resolves or returns something CrewAI can’t serialize cleanly.
This happens a lot when developers wrap external APIs, DB calls, or queue workers and forget to return the promise result, or they leave an HTTP request open.
| Broken | Fixed |
|---|---|
| ```ts | |
| import { Tool } from "@crewai/typescript"; |
export const lookupPolicyTool = new Tool({
name: "lookup_policy",
description: "Fetch policy details",
execute: async (policyId: string) => {
fetch(https://api.example.com/policies/${policyId}); // no await
// no return
},
});
|ts
import { Tool } from "@crewai/typescript";
export const lookupPolicyTool = new Tool({
name: "lookup_policy",
description: "Fetch policy details",
execute: async (policyId: string) => {
const res = await fetch(https://api.example.com/policies/${policyId});
if (!res.ok) {
throw new Error(`lookup_policy failed with ${res.status}`);
}
return await res.json();
}, });
The broken version leaves CrewAI waiting for a result that never comes back in a usable form. In production logs, this often appears alongside retries like:
- `Tool execution timed out`
- `chain execution stuck in production`
- `TaskRunner awaiting result from tool lookup_policy`
If you’re using a custom `Agent`, `Task`, or `Crew` wrapper in TypeScript, make sure every tool path does three things:
- `await` external work
- `return` a plain JSON-serializable object/string
- throw on failure instead of swallowing errors
## Other Possible Causes
### 1. Circular task dependencies
A task graph that points back to itself will never finish.
```ts
const taskA = new Task({
description: "Summarize claim",
agent: claimsAgent,
context: [taskB],
});
const taskB = new Task({
description: "Validate claim",
agent: validatorAgent,
context: [taskA],
});
Fix it by making the dependency graph one-directional:
const taskA = new Task({
description: "Validate claim",
agent: validatorAgent,
});
const taskB = new Task({
description: "Summarize claim",
agent: claimsAgent,
context: [taskA],
});
2. Missing termination conditions on agents
If your agent is configured to keep reasoning until it “figures it out,” it may loop indefinitely.
const agent = new Agent({
name: "ClaimsAgent",
goal: "Resolve claim issue",
maxIterations: undefined,
});
Set hard limits:
const agent = new Agent({
name: "ClaimsAgent",
goal: "Resolve claim issue",
maxIterations: 5,
});
In production, always cap iteration count and tool retries.
3. A tool returns non-serializable data
CrewAI pipelines usually expect plain data. Returning class instances, streams, sockets, or circular objects can cause downstream hangs.
execute: async () => {
return {
stream: fs.createReadStream("/tmp/report.pdf"),
};
}
Return text or JSON instead:
execute: async () => {
const report = await fs.promises.readFile("/tmp/report.pdf", "utf8");
return { report };
}
4. Network calls without timeouts
An external dependency that never responds will look like CrewAI is stuck.
await fetch("https://slow-service.internal/api");
Use an explicit timeout:
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 10_000);
try {
const res = await fetch("https://slow-service.internal/api", {
signal: controller.signal,
});
} finally {
clearTimeout(timeout);
}
How to Debug It
- •
Isolate the last successful step
- •Check which
TaskorToolran before the hang. - •If logs stop at
Tool.execute, the problem is inside the tool. - •If logs stop before any tool call, inspect task wiring and agent config.
- •Check which
- •
Add hard logging around every async boundary
- •Log before and after each external call.
- •Log returned payload sizes and types.
- •Example:
console.log("[lookup_policy] start", policyId); const result = await lookup(policyId); console.log("[lookup_policy] done", typeof result);
- •
Force timeouts on tools and HTTP clients
- •If removing the timeout makes the issue disappear, you found your blocker.
- •Set separate limits for:
- •LLM response time
- •tool execution time
- •overall crew run time
- •
Temporarily replace custom tools with stubs
- •Return fixed JSON from each tool.
- •If the chain completes with stubs, your orchestration is fine and the bug is in one of your integrations.
Prevention
- •
Keep every tool output:
- •small
- •JSON-serializable
- •deterministic when possible
- •
Put strict limits on production runs:
- •
maxIterations - •request timeouts
- •retry caps
- •
- •
Add integration tests for:
- •slow APIs
- •null responses
- •malformed payloads
- •rejected promises
If you’re seeing chain execution stuck in production in CrewAI TypeScript, start with the tools first. In real systems, that’s where most hangs come from.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit