How to Fix 'cold start latency when scaling' in CrewAI (TypeScript)

By Cyprian AaronsUpdated 2026-04-22
cold-start-latency-when-scalingcrewaitypescript

When you see cold start latency when scaling in a CrewAI TypeScript app, it usually means your workers are paying the startup cost every time a new instance spins up. In practice, this shows up when traffic increases, autoscaling kicks in, and each fresh process has to rebuild agents, reload tools, and re-establish model clients before it can answer anything.

The fix is usually not inside “CrewAI scaling” itself. It’s almost always about how you initialize Crew, Agent, Task, tool clients, and any network-bound dependencies in your TypeScript service.

The Most Common Cause

The #1 cause is recreating agents and tools inside the request path instead of reusing them across invocations.

That pattern forces every cold worker to do expensive setup work again. In Node.js, this often happens in serverless handlers, route handlers, or per-request factory functions.

Broken patternFixed pattern
Create new Crew() inside every requestCreate once at module scope or lazy-singleton
Rebuild tools on every callReuse initialized tool instances
Reconnect to LLM client per requestShare one configured client
// ❌ Broken: everything is rebuilt on every request
import { Crew, Agent, Task } from "crewai";

export async function POST(req: Request) {
  const agent = new Agent({
    role: "Support Analyst",
    goal: "Resolve customer issues",
    backstory: "You handle bank support tickets.",
  });

  const task = new Task({
    description: "Summarize the ticket",
    agent,
  });

  const crew = new Crew({
    agents: [agent],
    tasks: [task],
  });

  return Response.json(await crew.kickoff());
}
// ✅ Fixed: initialize once and reuse
import { Crew, Agent, Task } from "crewai";

const agent = new Agent({
  role: "Support Analyst",
  goal: "Resolve customer issues",
  backstory: "You handle bank support tickets.",
});

const task = new Task({
  description: "Summarize the ticket",
  agent,
});

const crew = new Crew({
  agents: [agent],
  tasks: [task],
});

export async function POST(req: Request) {
  return Response.json(await crew.kickoff());
}

If you’re using tools that hit external APIs, database clients, or embeddings providers, the same rule applies. Build them once unless they contain per-request credentials or state.

Other Possible Causes

1. Your worker is doing heavy bootstrap work on startup

If your service imports large modules or loads data at boot, scaling will amplify the delay.

// ❌ Heavy startup path
import fs from "node:fs";

const rules = JSON.parse(fs.readFileSync("./policy-rules.json", "utf8"));
// ✅ Lazy-load only when needed
let rulesCache: unknown;

async function getRules() {
  if (!rulesCache) {
    const res = await fetch("https://config.internal/policy-rules");
    rulesCache = await res.json();
  }
  return rulesCache;
}

2. Your model client is recreated for each run

A fresh OpenAI/Anthropic client per task adds connection overhead and can trigger repeated auth setup.

// ❌ Per-request client creation
function buildModelClient() {
  return {
    apiKey: process.env.OPENAI_API_KEY!,
  };
}
// ✅ Singleton client config
const modelClient = {
  apiKey: process.env.OPENAI_API_KEY!,
};

3. Tool initialization blocks on network calls

This is common with vector DBs, CRM connectors, and internal HTTP tools.

// ❌ Tool constructor does network I/O
class CustomerLookupTool {
  constructor() {
    this.schema = fetch("https://internal/schema");
  }
}
// ✅ Separate construction from I/O
class CustomerLookupTool {
  schema?: unknown;

  async init() {
    if (!this.schema) {
      const res = await fetch("https://internal/schema");
      this.schema = await res.json();
    }
  }
}

4. Autoscaling is too aggressive for your workload

If your platform scales to zero or scales out too quickly, you’ll keep hitting cold starts no matter how clean the code is.

# Example: too aggressive for latency-sensitive workloads
minReplicas: 0
maxReplicas: 50
scaleDownDelaySeconds: 10

A better baseline:

minReplicas: 2
maxReplicas: 20
scaleDownDelaySeconds: 300

For bank and insurance workloads, keeping a small warm pool is usually cheaper than eating repeated first-token latency on customer-facing flows.

How to Debug It

  1. Measure startup separately from request handling.

    • Log timestamps around module load, agent creation, tool init, and crew.kickoff().
    • If most time is before kickoff, you have a cold-start problem.
  2. Check whether initialization happens inside handlers.

    • Search for new Crew(, new Agent(, new Task(, and tool constructors inside route functions.
    • Anything inside POST, GET, queue consumers, or serverless handlers is suspect.
  3. Inspect logs for repeated setup messages.

    • Look for patterns like:
      • Initializing OpenAI client
      • Loading vector index
      • Connecting to Postgres
      • Crew kickoff started
    • If those repeat per request, you’re rebuilding state.
  4. Compare warm vs cold latency.

    • Send two identical requests.
    • If the first one is slow and the second one is fast, the issue is likely startup work rather than model inference.

Prevention

  • Initialize Crew, agents, and shared tools at module scope or behind a singleton factory.
  • Keep constructors pure; move network calls into explicit async init methods.
  • Set a non-zero minimum replica count for latency-sensitive production deployments.
  • Add boot-time metrics so you can see whether slowdowns come from app startup or model execution.

If you’re still seeing cold start latency when scaling after fixing initialization patterns, the next place to look is infrastructure autoscaling behavior. In most TypeScript CrewAI services I’ve seen in production, though, the root cause is simple: too much work in the wrong place.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides