How to Fix 'cold start latency in production' in CrewAI (TypeScript)

By Cyprian AaronsUpdated 2026-04-22
cold-start-latency-in-productioncrewaitypescript

When CrewAI shows cold start latency in production, it usually means your agent graph is paying initialization cost on the first real request. In TypeScript projects, this shows up when you build agents, tools, LLM clients, or memory stores inside the request path instead of once at startup.

It typically appears after deploys, autoscaling events, serverless invocations, or any traffic pattern where a fresh process handles the first job. The result is slow first-token time, timeouts, or a request that looks fine locally but stalls in production.

The Most Common Cause

The #1 cause is recreating CrewAI objects on every request.

That includes Agent, Task, Crew, tool instances, and model clients. In production, this turns a one-time startup cost into per-request latency.

Wrong vs right pattern

Broken patternFixed pattern
Build everything inside the handlerBuild once at module scope or during app bootstrap
Reconnect to APIs on every callReuse clients and caches
Create new tools per requestShare stable tool instances
// ❌ Wrong: cold start cost happens on every request
import { Agent, Task, Crew } from "crewai";
import { OpenAI } from "openai";

export async function handler(req: Request) {
  const llm = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

  const analyst = new Agent({
    role: "Support Analyst",
    goal: "Summarize ticket",
    backstory: "You handle customer support issues.",
    llm,
  });

  const task = new Task({
    description: "Summarize this ticket",
    agent: analyst,
  });

  const crew = new Crew({
    agents: [analyst],
    tasks: [task],
  });

  return await crew.kickoff();
}
// ✅ Right: initialize once and reuse
import { Agent, Task, Crew } from "crewai";
import { OpenAI } from "openai";

const llm = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const analyst = new Agent({
  role: "Support Analyst",
  goal: "Summarize ticket",
  backstory: "You handle customer support issues.",
  llm,
});

const task = new Task({
  description: "Summarize this ticket",
  agent: analyst,
});

const crew = new Crew({
  agents: [analyst],
  tasks: [task],
});

export async function handler(req: Request) {
  return await crew.kickoff();
}

If you are using a serverless runtime, module scope still helps only if the container stays warm. If the platform spins up a fresh instance, you need to reduce bootstrap work and cache aggressively.

Other Possible Causes

1. Tool initialization is doing network work

A common mistake is creating database connections, browser sessions, or HTTP clients inside a tool constructor.

// ❌ Bad
class CustomerLookupTool {
  constructor() {
    this.db = connectToDatabase(); // slow
  }
}
// ✅ Good
const db = connectToDatabase();

class CustomerLookupTool {
  constructor(private readonly db = db) {}
}

If your tool calls third-party services during construction, move that work into an explicit init() step or lazy-load it on first use.

2. Your model client is re-authenticating each call

Some wrappers fetch tokens or discovery metadata repeatedly. That adds several hundred milliseconds before the first LLM call.

// ❌ Bad
export async function createModel() {
  return new AzureOpenAI({
    apiKey: await fetchToken(),
    endpoint: process.env.AZURE_ENDPOINT!,
  });
}
// ✅ Good
const tokenPromise = fetchToken();

export const modelConfig = {
  apiKeyPromise: tokenPromise,
  endpoint: process.env.AZURE_ENDPOINT!,
};

If your provider supports it, cache tokens until expiry instead of fetching them per request.

3. Memory or vector store hydration is too heavy

Loading large conversation history or syncing embeddings during startup can trigger the exact symptom.

// ❌ Bad
await memory.loadAllConversations();
await vectorStore.syncAllDocuments();
// ✅ Good
await memory.loadRecentConversations(20);
await vectorStore.ensureIndexExists();

For production crews, load only what you need for the current tenant or session.

4. You are running with aggressive autoscaling or serverless cold starts

If your platform scales from zero, the first request will always pay some startup tax. The fix is not “remove cold starts”; it’s “make cold starts smaller.”

# Example for a serverless platform
minInstances: 1
timeoutSeconds: 30
memory: 1024Mi

If you cannot keep warm instances alive, trim imports, avoid eager SDK initialization, and precompute static prompts at build time.

How to Debug It

  1. Measure startup separately from request handling.

    • Log timestamps before and after importing CrewAI objects.
    • Log timestamps before and after crew.kickoff().
  2. Check whether object creation happens inside handlers.

    • Search for new Agent(, new Task(, new Crew( inside route functions.
    • Move them to module scope if they do not depend on request data.
  3. Isolate tool initialization.

    • Temporarily replace all tools with stubs.
    • If latency disappears, the problem is in DB connections, HTTP auth, or browser setup.
  4. Inspect platform behavior.

    • Confirm whether your app is running in serverless mode or behind autoscaling.
    • Look for logs showing fresh container starts right before slow requests.

A useful signal is seeing requests like:

  • Crew kickoff started
  • long pause
  • Task execution timed out
  • Error: cold start latency in production

That usually means the process was healthy but not warmed up enough to meet your SLA.

Prevention

  • Initialize Agent, Task, Crew, model clients, and tools once at bootstrap unless they are truly request-specific.
  • Keep constructors cheap. Put network calls behind explicit methods instead of class instantiation.
  • Add timing logs around startup and kickoff so you can spot regressions before users do.
  • If you run on serverless infrastructure, set warm instance minimums where the platform allows it.

The practical rule is simple: if something does not depend on the incoming request body or tenant context, do not build it inside the handler. In CrewAI TypeScript apps, that one change fixes most cold start latency complaints immediately.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides