How to Fix 'cold start latency when scaling' in CrewAI (TypeScript)
When you see cold start latency when scaling in a CrewAI TypeScript app, it usually means your workers are paying the startup cost every time a new instance spins up. In practice, this shows up when traffic increases, autoscaling kicks in, and each fresh process has to rebuild agents, reload tools, and re-establish model clients before it can answer anything.
The fix is usually not inside “CrewAI scaling” itself. It’s almost always about how you initialize Crew, Agent, Task, tool clients, and any network-bound dependencies in your TypeScript service.
The Most Common Cause
The #1 cause is recreating agents and tools inside the request path instead of reusing them across invocations.
That pattern forces every cold worker to do expensive setup work again. In Node.js, this often happens in serverless handlers, route handlers, or per-request factory functions.
| Broken pattern | Fixed pattern |
|---|---|
Create new Crew() inside every request | Create once at module scope or lazy-singleton |
| Rebuild tools on every call | Reuse initialized tool instances |
| Reconnect to LLM client per request | Share one configured client |
// ❌ Broken: everything is rebuilt on every request
import { Crew, Agent, Task } from "crewai";
export async function POST(req: Request) {
const agent = new Agent({
role: "Support Analyst",
goal: "Resolve customer issues",
backstory: "You handle bank support tickets.",
});
const task = new Task({
description: "Summarize the ticket",
agent,
});
const crew = new Crew({
agents: [agent],
tasks: [task],
});
return Response.json(await crew.kickoff());
}
// ✅ Fixed: initialize once and reuse
import { Crew, Agent, Task } from "crewai";
const agent = new Agent({
role: "Support Analyst",
goal: "Resolve customer issues",
backstory: "You handle bank support tickets.",
});
const task = new Task({
description: "Summarize the ticket",
agent,
});
const crew = new Crew({
agents: [agent],
tasks: [task],
});
export async function POST(req: Request) {
return Response.json(await crew.kickoff());
}
If you’re using tools that hit external APIs, database clients, or embeddings providers, the same rule applies. Build them once unless they contain per-request credentials or state.
Other Possible Causes
1. Your worker is doing heavy bootstrap work on startup
If your service imports large modules or loads data at boot, scaling will amplify the delay.
// ❌ Heavy startup path
import fs from "node:fs";
const rules = JSON.parse(fs.readFileSync("./policy-rules.json", "utf8"));
// ✅ Lazy-load only when needed
let rulesCache: unknown;
async function getRules() {
if (!rulesCache) {
const res = await fetch("https://config.internal/policy-rules");
rulesCache = await res.json();
}
return rulesCache;
}
2. Your model client is recreated for each run
A fresh OpenAI/Anthropic client per task adds connection overhead and can trigger repeated auth setup.
// ❌ Per-request client creation
function buildModelClient() {
return {
apiKey: process.env.OPENAI_API_KEY!,
};
}
// ✅ Singleton client config
const modelClient = {
apiKey: process.env.OPENAI_API_KEY!,
};
3. Tool initialization blocks on network calls
This is common with vector DBs, CRM connectors, and internal HTTP tools.
// ❌ Tool constructor does network I/O
class CustomerLookupTool {
constructor() {
this.schema = fetch("https://internal/schema");
}
}
// ✅ Separate construction from I/O
class CustomerLookupTool {
schema?: unknown;
async init() {
if (!this.schema) {
const res = await fetch("https://internal/schema");
this.schema = await res.json();
}
}
}
4. Autoscaling is too aggressive for your workload
If your platform scales to zero or scales out too quickly, you’ll keep hitting cold starts no matter how clean the code is.
# Example: too aggressive for latency-sensitive workloads
minReplicas: 0
maxReplicas: 50
scaleDownDelaySeconds: 10
A better baseline:
minReplicas: 2
maxReplicas: 20
scaleDownDelaySeconds: 300
For bank and insurance workloads, keeping a small warm pool is usually cheaper than eating repeated first-token latency on customer-facing flows.
How to Debug It
- •
Measure startup separately from request handling.
- •Log timestamps around module load, agent creation, tool init, and
crew.kickoff(). - •If most time is before kickoff, you have a cold-start problem.
- •Log timestamps around module load, agent creation, tool init, and
- •
Check whether initialization happens inside handlers.
- •Search for
new Crew(,new Agent(,new Task(, and tool constructors inside route functions. - •Anything inside
POST,GET, queue consumers, or serverless handlers is suspect.
- •Search for
- •
Inspect logs for repeated setup messages.
- •Look for patterns like:
- •
Initializing OpenAI client - •
Loading vector index - •
Connecting to Postgres - •
Crew kickoff started
- •
- •If those repeat per request, you’re rebuilding state.
- •Look for patterns like:
- •
Compare warm vs cold latency.
- •Send two identical requests.
- •If the first one is slow and the second one is fast, the issue is likely startup work rather than model inference.
Prevention
- •Initialize
Crew, agents, and shared tools at module scope or behind a singleton factory. - •Keep constructors pure; move network calls into explicit async init methods.
- •Set a non-zero minimum replica count for latency-sensitive production deployments.
- •Add boot-time metrics so you can see whether slowdowns come from app startup or model execution.
If you’re still seeing cold start latency when scaling after fixing initialization patterns, the next place to look is infrastructure autoscaling behavior. In most TypeScript CrewAI services I’ve seen in production, though, the root cause is simple: too much work in the wrong place.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit