How to Fix 'connection timeout in production' in CrewAI (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
connection-timeout-in-productioncrewaitypescript

What the error means

connection timeout in production usually means your CrewAI TypeScript app tried to call a remote dependency and never got a response before the timeout window expired. In practice, this shows up when the agent is calling an LLM API, a tool endpoint, or an internal service that is slow, unreachable, or blocked by network policy.

In CrewAI apps, the failure often surfaces as a runtime error from the underlying HTTP client, not from your agent logic. You’ll typically see something like Error: connection timeout, ETIMEDOUT, or Request timed out while executing a Crew, Agent, or tool call.

The Most Common Cause

The #1 cause is using a local/dev configuration in production, especially with an endpoint that is only reachable from your laptop or VPC. A common pattern is hardcoding localhost, using a dev base URL, or forgetting that production containers cannot reach private services without proper routing.

Here’s the broken pattern next to the fixed one.

BrokenFixed
```ts
import { Agent, Crew } from "crewai";

const agent = new Agent({ name: "SupportAgent", goal: "Answer customer questions", llm: { apiKey: process.env.OPENAI_API_KEY, baseUrl: "http://localhost:3001/v1", // works locally only model: "gpt-4o-mini", }, });

const crew = new Crew({ agents: [agent], }); |ts import { Agent, Crew } from "crewai";

const agent = new Agent({ name: "SupportAgent", goal: "Answer customer questions", llm: { apiKey: process.env.OPENAI_API_KEY, baseUrl: process.env.LLM_BASE_URL ?? "https://api.openai.com/v1", model: "gpt-4o-mini", timeoutMs: 30000, }, });

const crew = new Crew({ agents: [agent], });


If you are calling internal tools, the same issue applies.

| Broken | Fixed |
|---|---|
| ```ts
const fetchCustomer = async (id: string) => {
  const res = await fetch(`http://localhost:8080/customers/${id}`);
  return res.json();
};
``` | ```ts
const fetchCustomer = async (id: string) => {
  const res = await fetch(
    `${process.env.CUSTOMER_SERVICE_URL}/customers/${id}`,
    { signal: AbortSignal.timeout(8000) }
  );
  return res.json();
};
``` |

In production, `localhost` points to the container itself. That means your CrewAI tool call never reaches the real service, and after enough waiting you get a timeout.

## Other Possible Causes

- **Missing or too-low timeout settings**
  
  Some SDKs default to aggressive timeouts. If your agent does long reasoning plus tool calls, bump the limit explicitly.

  ```ts
  const agent = new Agent({
    name: "ClaimsAgent",
    goal: "Process claims",
    llm: {
      apiKey: process.env.OPENAI_API_KEY,
      model: "gpt-4o-mini",
      timeoutMs: 60000,
    },
  });
  • Cold starts or overloaded serverless functions

    If your tool runs on Lambda, Vercel Functions, or similar infrastructure, cold starts can push you over the edge.

    export const config = {
      maxDuration: 60,
      runtime: "nodejs20.x",
    };
    
  • DNS / firewall / private network issues

    The app may resolve the host but cannot complete the TCP connection. This is common in Kubernetes, corporate networks, and locked-down VPCs.

    nslookup api.internal.company.local
    curl -v https://api.internal.company.local/health
    
  • Tool code doing blocking work before responding

    If your custom tool spends time on DB queries, file IO, or retries before returning anything, CrewAI just waits until the request times out.

    const tool = async () => {
      // bad if this takes too long without timeouts/retries
      const data = await expensiveDatabaseQuery();
      return data;
    };
    

How to Debug It

  1. Check whether it fails on LLM calls or tool calls

    Look at the stack trace and identify whether the error happens inside Agent execution or inside your custom tool function. If it dies during model invocation, inspect baseUrl, API key routing, and provider connectivity first.

  2. Print effective runtime config

    Log every URL and timeout used in production. Do not assume env vars are set correctly just because they worked in staging.

    console.log({
      LLM_BASE_URL: process.env.LLM_BASE_URL,
      CUSTOMER_SERVICE_URL: process.env.CUSTOMER_SERVICE_URL,
      NODE_ENV: process.env.NODE_ENV,
    });
    
  3. Test each dependency outside CrewAI

    Hit the same endpoint with plain fetch or curl from the production host/container. If that fails there too, CrewAI is not the problem.

    curl -v https://your-service.example.com/health
    
  4. Add timing around every external call

    Measure where time is spent so you can separate slow model inference from broken networking.

    const start = Date.now();
    await someToolCall();
    console.log("tool_ms", Date.now() - start);
    

Prevention

  • Use explicit timeouts on every outbound request.
  • Never ship localhost or dev-only base URLs in production configs.
  • Add health checks for LLM endpoints and internal tools before starting worker processes.
  • Keep retries bounded; infinite retry loops turn transient slowness into guaranteed timeouts.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides