How to Fix 'intermittent 500 errors' in CrewAI (TypeScript)
What this error usually means
An intermittent 500 in CrewAI TypeScript usually means your agent or task pipeline is throwing an unhandled exception somewhere inside the request lifecycle. It often shows up only on certain inputs, certain tools, or after a few successful runs because the failure is data-dependent or timing-dependent.
In practice, this is rarely “CrewAI is broken.” It’s usually one of these: invalid tool input, async code returning the wrong shape, missing environment variables, or a model/tool timeout that gets wrapped as a generic server error.
The Most Common Cause
The #1 cause I see is a tool function that throws or returns an unexpected value, and CrewAI surfaces it as a generic 500 Internal Server Error. In TypeScript, this often happens when the tool expects a string but gets an object, or when you forget to handle async errors inside the tool.
Broken vs fixed pattern
| Broken | Fixed |
|---|---|
| Tool throws on bad input | Tool validates input and returns structured output |
| Returns raw object with unstable shape | Returns predictable string/JSON payload |
| No try/catch around external call | Errors are caught and converted to safe failures |
// BROKEN
import { Agent, Task, Crew } from "@crew-ai/crew";
const searchTool = async (input: any) => {
// Fails intermittently when input.query is missing or not a string
const res = await fetch(`https://api.example.com/search?q=${input.query}`);
return await res.json();
};
const agent = new Agent({
name: "SupportAgent",
tools: [searchTool],
});
const task = new Task({
description: "Search customer records",
agent,
});
const crew = new Crew({ agents: [agent], tasks: [task] });
await crew.kickoff();
// FIXED
import { Agent, Task, Crew } from "@crew-ai/crew";
type SearchInput = { query: string };
const searchTool = async (input: SearchInput): Promise<string> => {
if (!input?.query || typeof input.query !== "string") {
throw new Error("searchTool: 'query' must be a non-empty string");
}
try {
const res = await fetch(
`https://api.example.com/search?q=${encodeURIComponent(input.query)}`
);
if (!res.ok) {
throw new Error(`searchTool upstream failed with ${res.status}`);
}
const data = await res.json();
return JSON.stringify(data);
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
throw new Error(`searchTool failed: ${message}`);
}
};
const agent = new Agent({
name: "SupportAgent",
tools: [searchTool],
});
const task = new Task({
description: "Search customer records",
agent,
});
const crew = new Crew({ agents: [agent], tasks: [task] });
await crew.kickoff();
Why this fixes it:
- •The tool contract is explicit.
- •Invalid input fails early with a useful message.
- •External failures are wrapped instead of bubbling up as random
500s.
Other Possible Causes
1. Missing environment variables
A lot of “intermittent” failures are actually conditional failures caused by one environment variable being present locally but missing in one runtime.
// BROKEN
const apiKey = process.env.OPENAI_API_KEY!;
// FIXED
const apiKey = process.env.OPENAI_API_KEY;
if (!apiKey) {
throw new Error("Missing OPENAI_API_KEY");
}
If your app runs in multiple environments, verify all of them:
- •local
.env - •Docker container env
- •CI/CD secrets
- •serverless runtime config
2. Async race conditions in shared state
If you reuse mutable objects across tasks, one run can corrupt another. This shows up when multiple tasks update the same array or object.
// BROKEN
const memory: string[] = [];
async function addNote(note: string) {
memory.push(note);
}
// FIXED
async function addNote(note: string) {
const localMemory = [...memory];
localMemory.push(note);
return localMemory;
}
If you need shared state, use:
- •per-request context
- •immutable updates
- •a real store like Redis or Postgres
3. Model response parsing assumptions
If you assume the model always returns valid JSON, one malformed response can trigger an exception that becomes a 500.
// BROKEN
const result = JSON.parse(modelOutput);
// FIXED
let result;
try {
result = JSON.parse(modelOutput);
} catch {
throw new Error(`Invalid JSON from model: ${modelOutput.slice(0, 200)}`);
}
This matters more when using structured outputs or chaining tasks that depend on exact keys.
4. Timeout from upstream tools or APIs
Slow APIs often appear as random failures because they only time out under load.
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 5000);
try {
const res = await fetch(url, { signal: controller.signal });
} finally {
clearTimeout(timeoutId);
}
If you do not set timeouts explicitly, your runtime may kill the request and CrewAI just reports a generic server failure.
How to Debug It
- •
Find the first real stack trace
- •Do not stop at
500 Internal Server Error. - •Look for the first exception thrown inside your tool, task handler, or model parsing code.
- •Common messages include:
- •
TypeError: Cannot read properties of undefined - •
Error: searchTool failed - •
SyntaxError: Unexpected token ... in JSON
- •
- •Do not stop at
- •
Disable tools one by one
- •Run the crew with only one agent and one task.
- •Remove each tool until the
500disappears. - •The last removed tool is usually where the bug lives.
- •
Log input and output shapes
- •Print what each tool receives and returns.
- •Check for
undefined, arrays where strings are expected, and nested objects that change shape between runs.
console.log("tool input:", JSON.stringify(input));
console.log("tool output:", JSON.stringify(output));
- •Reproduce with the same payload
- •Save the exact failing prompt and tool input.
- •Run it locally against the same model version and environment variables.
- •If it only fails in production, compare runtime differences:
- •Node version
- •env vars
- •rate limits
- •timeout settings
Prevention
- •Validate every tool boundary with explicit types and runtime checks.
- •Wrap external API calls in
try/catchand convert failures into actionable errors. - •Keep task state immutable unless you are writing to a proper datastore.
- •Add request IDs and structured logs so you can trace one crew run across agents and tools.
If you want fewer intermittent 500s in CrewAI TypeScript, treat every tool like an API boundary. Most of these failures are not AI problems; they’re ordinary production bugs hiding behind an LLM workflow.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit