How to Fix 'chain execution stuck when scaling' in CrewAI (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
chain-execution-stuck-when-scalingcrewaitypescript

When you see chain execution stuck when scaling in CrewAI TypeScript, it usually means your workflow is waiting on a task that never resolves, or the agent graph is creating more work than your runtime can drain. This shows up most often when you add parallelism, nested crews, or async tools that don’t return cleanly.

In practice, the bug is rarely “CrewAI is broken.” It’s usually a bad async boundary, an unbounded loop, or a task dependency that makes the chain wait forever.

The Most Common Cause

The #1 cause is returning a promise that never settles from a tool, task callback, or custom step. In TypeScript, this often happens when you forget to await an internal async call, swallow errors, or keep a stream/socket open while CrewAI expects a finite result.

Here’s the broken pattern versus the fixed pattern.

BrokenFixed
Tool returns unresolved promiseTool awaits and returns a plain result
Errors swallowed inside catchErrors rethrown so the chain can fail fast
Long-lived connection never closedConnection closed before returning
// ❌ Broken: hangs under scale because the promise never resolves cleanly
import { Agent } from "@crewai/core";
import { z } from "zod";

const fetchCustomerTool = {
  name: "fetch_customer",
  description: "Fetch customer data",
  schema: z.object({ customerId: z.string() }),
  execute: async ({ customerId }: { customerId: string }) => {
    try {
      const res = fetch(`https://api.internal/customers/${customerId}`);
      // Missing await -> returns Promise<Response>, not data
      const json = await (res as any).json();
      return json;
    } catch (err) {
      console.error("tool failed", err);
      // Swallowed error -> chain keeps waiting in some setups
    }
  },
};

const agent = new Agent({
  role: "Support analyst",
  goal: "Summarize customer account status",
  tools: [fetchCustomerTool],
});
// ✅ Fixed: resolves deterministically and fails fast
import { Agent } from "@crewai/core";
import { z } from "zod";

const fetchCustomerTool = {
  name: "fetch_customer",
  description: "Fetch customer data",
  schema: z.object({ customerId: z.string() }),
  execute: async ({ customerId }: { customerId: string }) => {
    const res = await fetch(`https://api.internal/customers/${customerId}`);

    if (!res.ok) {
      throw new Error(`fetch_customer failed with HTTP ${res.status}`);
    }

    const json = await res.json();
    return json;
  },
};

const agent = new Agent({
  role: "Support analyst",
  goal: "Summarize customer account status",
  tools: [fetchCustomerTool],
});

If you’re seeing logs like TaskRunner waiting for completion, Execution timed out, or a chain that never advances past one step, inspect every tool and callback first. In CrewAI TypeScript, one unresolved async function can block the entire execution path.

Other Possible Causes

1) Circular task dependencies

A task depends on output from another task that eventually depends on the first one. That creates a deadlock-like loop where the scheduler has no valid next step.

// ❌ Task A waits on B, B waits on A
const tasks = [
  { id: "A", dependsOn: ["B"], description: "Draft policy summary" },
  { id: "B", dependsOn: ["A"], description: "Review policy summary" },
];

Fix it by making dependencies strictly directional.

// ✅ Linear dependency chain
const tasks = [
  { id: "A", dependsOn: [], description: "Draft policy summary" },
  { id: "B", dependsOn: ["A"], description: "Review policy summary" },
];

2) Unbounded parallel fan-out

Scaling often exposes code that spawns too many concurrent tasks. If you fire off hundreds of tool calls without a concurrency limit, Node will bottleneck and CrewAI may appear stuck.

// ❌ No concurrency control
await Promise.all(customers.map((c) => processClaim(c)));

Use a limiter.

// ✅ Controlled concurrency
import pLimit from "p-limit";

const limit = pLimit(5);
await Promise.all(customers.map((c) => limit(() => processClaim(c))));

3) Missing timeout on external calls

A slow CRM, LLM provider, or internal API can hold the chain open indefinitely if you don’t set timeouts.

// ✅ Abort after 10s
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 10_000);

try {
  const res = await fetch(url, { signal: controller.signal });
  return await res.json();
} finally {
  clearTimeout(timeout);
}

Without this, you’ll see behavior like:

  • one worker pinned at pending
  • no downstream task starts
  • logs stop after the tool invocation line

4) Returning non-serializable objects

CrewAI task outputs should be plain JSON-ish data. Returning streams, class instances like Map, or circular objects can break persistence and make orchestration look frozen.

// ❌ Bad output shape
return new Map([["status", "ok"]]);
// ✅ Plain object output
return { status: "ok" };

How to Debug It

  1. Find the exact step that stops progressing
    Turn on verbose logs around Agent, TaskRunner, and any custom tool execution. If the last log line is always the same tool name, that tool is your suspect.

  2. Remove parallelism temporarily
    Replace Promise.all(...) with sequential execution for one run. If the issue disappears, you have a concurrency bottleneck or hidden dependency cycle.

  3. Add hard timeouts to every external call
    Wrap fetches, DB calls, and LLM requests with abort logic. If a timeout triggers consistently at one integration point, you found the blocker.

  4. Validate outputs before returning them
    Log JSON.stringify(result) right before return. If serialization fails or produces huge payloads, trim the output to only what downstream tasks need.

Prevention

  • Keep every tool output small, serializable, and deterministic.
  • Put timeouts and retries at integration boundaries, not inside agent logic.
  • Cap concurrency explicitly when scaling crews across many tasks or customers.
  • Fail fast on errors instead of swallowing exceptions inside tools or callbacks.

If you’re building with CrewAI TypeScript and this error only appears under load, treat it like an orchestration bug first and an LLM bug second. In most cases, fixing one unresolved promise or one circular dependency clears the whole chain.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides