How to Fix 'connection timeout in production' in CrewAI (TypeScript)
What the error means
connection timeout in production usually means your CrewAI TypeScript app tried to call a remote dependency and never got a response before the timeout window expired. In practice, this shows up when the agent is calling an LLM API, a tool endpoint, or an internal service that is slow, unreachable, or blocked by network policy.
In CrewAI apps, the failure often surfaces as a runtime error from the underlying HTTP client, not from your agent logic. You’ll typically see something like Error: connection timeout, ETIMEDOUT, or Request timed out while executing a Crew, Agent, or tool call.
The Most Common Cause
The #1 cause is using a local/dev configuration in production, especially with an endpoint that is only reachable from your laptop or VPC. A common pattern is hardcoding localhost, using a dev base URL, or forgetting that production containers cannot reach private services without proper routing.
Here’s the broken pattern next to the fixed one.
| Broken | Fixed |
|---|---|
| ```ts | |
| import { Agent, Crew } from "crewai"; |
const agent = new Agent({ name: "SupportAgent", goal: "Answer customer questions", llm: { apiKey: process.env.OPENAI_API_KEY, baseUrl: "http://localhost:3001/v1", // works locally only model: "gpt-4o-mini", }, });
const crew = new Crew({
agents: [agent],
});
|ts
import { Agent, Crew } from "crewai";
const agent = new Agent({ name: "SupportAgent", goal: "Answer customer questions", llm: { apiKey: process.env.OPENAI_API_KEY, baseUrl: process.env.LLM_BASE_URL ?? "https://api.openai.com/v1", model: "gpt-4o-mini", timeoutMs: 30000, }, });
const crew = new Crew({ agents: [agent], });
If you are calling internal tools, the same issue applies.
| Broken | Fixed |
|---|---|
| ```ts
const fetchCustomer = async (id: string) => {
const res = await fetch(`http://localhost:8080/customers/${id}`);
return res.json();
};
``` | ```ts
const fetchCustomer = async (id: string) => {
const res = await fetch(
`${process.env.CUSTOMER_SERVICE_URL}/customers/${id}`,
{ signal: AbortSignal.timeout(8000) }
);
return res.json();
};
``` |
In production, `localhost` points to the container itself. That means your CrewAI tool call never reaches the real service, and after enough waiting you get a timeout.
## Other Possible Causes
- **Missing or too-low timeout settings**
Some SDKs default to aggressive timeouts. If your agent does long reasoning plus tool calls, bump the limit explicitly.
```ts
const agent = new Agent({
name: "ClaimsAgent",
goal: "Process claims",
llm: {
apiKey: process.env.OPENAI_API_KEY,
model: "gpt-4o-mini",
timeoutMs: 60000,
},
});
- •
Cold starts or overloaded serverless functions
If your tool runs on Lambda, Vercel Functions, or similar infrastructure, cold starts can push you over the edge.
export const config = { maxDuration: 60, runtime: "nodejs20.x", }; - •
DNS / firewall / private network issues
The app may resolve the host but cannot complete the TCP connection. This is common in Kubernetes, corporate networks, and locked-down VPCs.
nslookup api.internal.company.local curl -v https://api.internal.company.local/health - •
Tool code doing blocking work before responding
If your custom tool spends time on DB queries, file IO, or retries before returning anything, CrewAI just waits until the request times out.
const tool = async () => { // bad if this takes too long without timeouts/retries const data = await expensiveDatabaseQuery(); return data; };
How to Debug It
- •
Check whether it fails on LLM calls or tool calls
Look at the stack trace and identify whether the error happens inside
Agentexecution or inside your custom tool function. If it dies during model invocation, inspectbaseUrl, API key routing, and provider connectivity first. - •
Print effective runtime config
Log every URL and timeout used in production. Do not assume env vars are set correctly just because they worked in staging.
console.log({ LLM_BASE_URL: process.env.LLM_BASE_URL, CUSTOMER_SERVICE_URL: process.env.CUSTOMER_SERVICE_URL, NODE_ENV: process.env.NODE_ENV, }); - •
Test each dependency outside CrewAI
Hit the same endpoint with plain
fetchorcurlfrom the production host/container. If that fails there too, CrewAI is not the problem.curl -v https://your-service.example.com/health - •
Add timing around every external call
Measure where time is spent so you can separate slow model inference from broken networking.
const start = Date.now(); await someToolCall(); console.log("tool_ms", Date.now() - start);
Prevention
- •Use explicit timeouts on every outbound request.
- •Never ship
localhostor dev-only base URLs in production configs. - •Add health checks for LLM endpoints and internal tools before starting worker processes.
- •Keep retries bounded; infinite retry loops turn transient slowness into guaranteed timeouts.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit