How to Fix 'timeout error in production' in CrewAI (TypeScript)
If you’re seeing timeout error in production in CrewAI TypeScript, the request is getting killed before the agent run finishes. In practice, this usually shows up when the task takes longer than your serverless function, reverse proxy, or client timeout.
The important part: this is usually not a “CrewAI bug”. It’s almost always a mismatch between agent runtime and infrastructure limits.
The Most Common Cause
The #1 cause is wrapping a long-running CrewAI execution inside an HTTP request and waiting for it to finish synchronously.
Typical failure pattern:
- •API route starts a
Crew, calls.kickoff() - •The request stays open while the agent does tool calls, retries, or multi-step reasoning
- •Your platform kills the request at 10s, 30s, or 60s
Broken vs fixed pattern
| Broken | Fixed |
|---|---|
| Wait for the whole crew inside the request | Enqueue work and return immediately |
| Let the browser hold the connection open | Poll job status or use webhooks |
| Assume agent runtime fits HTTP timeout | Treat crew runs as background jobs |
// BROKEN: synchronous crew execution inside an HTTP handler
import { Crew } from "@crewai/crewai";
export async function POST(req: Request) {
const crew = new Crew({
agents: [/* ... */],
tasks: [/* ... */],
});
const result = await crew.kickoff(); // can exceed API timeout
return Response.json({ result });
}
// FIXED: queue the job and return immediately
import { Crew } from "@crewai/crewai";
import { enqueueCrewRun } from "./queue";
export async function POST(req: Request) {
const payload = await req.json();
const jobId = await enqueueCrewRun({
type: "customer-support-triage",
payload,
});
return Response.json(
{ jobId, status: "queued" },
{ status: 202 }
);
}
// worker.ts
export async function processCrewJob(job: {
id: string;
payload: unknown;
}) {
const crew = new Crew({
agents: [/* ... */],
tasks: [/* ... */],
});
const result = await crew.kickoff();
return result;
}
If you’re using Vercel, Cloudflare Workers, Lambda, or any API gateway, this is the first thing to fix.
Other Possible Causes
1) Tool calls are hanging
A single slow tool can stall the whole run. This happens with HTTP tools that have no timeout or no retry policy.
const response = await fetch(url); // no timeout
Use an abort signal:
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 5000);
const response = await fetch(url, {
signal: controller.signal,
});
clearTimeout(timeout);
2) The model call itself has no timeout budget
If you’re calling an LLM provider through a custom client, make sure your SDK call has a hard timeout. Otherwise the agent waits until your infrastructure cuts it off.
const llm = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
timeout: 10000,
});
If your CrewAI setup wraps provider clients manually, verify that every model invocation inherits this timeout.
3) Too many sequential tasks or retries
A multi-agent crew with sequential tasks can easily exceed production limits if each step is slow.
const crew = new Crew({
process: "sequential",
tasks: [
researchTask,
validateTask,
summarizeTask,
complianceTask,
],
});
If each task does multiple tool calls, your total runtime grows fast. Reduce task count or split the workflow into smaller jobs.
4) Platform timeout is lower than app timeout
Your app may allow 60s, but your platform may kill requests at 15s.
Common examples:
- •Next.js on serverless platforms
- •API Gateway + Lambda default timeouts
- •Nginx / ingress proxy timeouts
- •Browser fetch aborts on client-side requests
Check config like this:
{
"functions": {
"api/**/*.ts": {
"maxDuration": 10
}
}
}
Or in Lambda:
// AWS Lambda setting outside code:
// Timeout = 10 seconds
How to Debug It
- •
Measure where time is spent
- •Log timestamps before and after each step:
- •queueing
- •LLM call
- •tool call
- •final aggregation
- •
Isolate the failing boundary
- •Run
crew.kickoff()from a local script instead of an HTTP route. - •If it works locally but fails in prod, it’s likely infra timeout.
- •Run
- •
Inspect tool latency
- •Temporarily disable external tools.
- •If the error disappears, one of your tools is slow or hanging.
- •
Check server logs for cutoff signatures
- •Look for messages like:
- •
Function timed out - •
504 Gateway Timeout - •
context deadline exceeded - •
Request aborted - •
CrewAI kickoff exceeded execution window
- •
- •Look for messages like:
Prevention
- •Keep CrewAI runs out of synchronous request/response paths.
- •Put hard timeouts on every external dependency:
- •LLM client
- •fetch calls
- •database queries
- •Split long workflows into:
- •short interactive steps
- •background jobs for heavy reasoning and tool use
If you want one rule to remember: HTTP requests are for starting work, not finishing it. Once you treat CrewAI runs as jobs instead of web requests, this class of timeout errors usually disappears.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit