How to Fix 'cold start latency in production' in LangGraph (TypeScript)
Cold start latency in production usually means your graph is doing too much work on the first request: loading models, compiling the graph, creating clients, or hitting external services before anything is cached. In LangGraph with TypeScript, it shows up most often when the app is deployed serverless or behind an autoscaling container and the first user request pays all initialization costs.
The actual symptom is usually not a LangGraph runtime exception. It’s a slow first response, timeouts from your API gateway, or logs showing the graph compilation path being hit on every request.
The Most Common Cause
The #1 cause is building the graph inside the request handler instead of once at process startup.
That means StateGraph.compile() runs for every request, and any expensive node setup gets repeated. In production, this turns a normal graph into a cold-start machine.
| Broken pattern | Fixed pattern |
|---|---|
| Compile per request | Compile once and reuse |
| Recreate model/client per call | Instantiate clients globally |
| No warmup path | Pre-warm at boot |
// ❌ Broken: graph compiled on every request
import { StateGraph } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
export async function POST(req: Request) {
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
const workflow = new StateGraph({
channels: {
messages: { value: (x: any[], y: any[]) => x.concat(y), default: () => [] },
},
});
workflow.addNode("agent", async (state) => {
const result = await llm.invoke(state.messages);
return { messages: [result] };
});
workflow.setEntryPoint("agent");
const app = workflow.compile(); // expensive every request
const output = await app.invoke({ messages: [{ role: "user", content: "hi" }] });
return Response.json(output);
}
// ✅ Fixed: compile once at module load
import { StateGraph } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
const workflow = new StateGraph({
channels: {
messages: { value: (x: any[], y: any[]) => x.concat(y), default: () => [] },
},
});
workflow.addNode("agent", async (state) => {
const result = await llm.invoke(state.messages);
return { messages: [result] };
});
workflow.setEntryPoint("agent");
const app = workflow.compile(); // compile once
export async function POST(req: Request) {
const output = await app.invoke({ messages: [{ role: "user", content: "hi" }] });
return Response.json(output);
}
If you’re seeing logs like StateGraph.compile() or CompiledStateGraph creation during each request, this is your problem.
Other Possible Causes
1. Creating a new LLM client inside each node
This is common when people keep constructor logic inside node functions.
// ❌ Bad
workflow.addNode("agent", async (state) => {
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
return { messages: [await llm.invoke(state.messages)] };
});
Move it outside the node so connection setup and auth parsing happen once.
// ✅ Good
const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
workflow.addNode("agent", async (state) => {
return { messages: [await llm.invoke(state.messages)] };
});
2. Loading large prompt files or templates on demand
If your node reads files from disk during execution, your first request will pay I/O cost.
// ❌ Bad
workflow.addNode("prompt", async () => {
const prompt = await fs.promises.readFile("./prompts/system.txt", "utf8");
return { prompt };
});
Cache it at startup.
// ✅ Good
const systemPrompt = await fs.promises.readFile("./prompts/system.txt", "utf8");
3. Cold serverless containers with no warmup
On AWS Lambda, Vercel, or Cloud Run scale-to-zero, “cold start latency” often means the platform is waking the container, not LangGraph itself.
{
"minInstances": 1,
"timeoutSeconds": 30
}
For Lambda-style deployments, keep one instance warm if latency matters. If you can’t do that, move heavy initialization out of the hot path and reduce startup work.
4. Using dynamic imports in the request path
Dynamic imports are fine for optional features. They’re bad when used for core graph dependencies.
// ❌ Bad
export async function POST() {
const { ChatOpenAI } = await import("@langchain/openai");
}
Use static imports for critical runtime code so bundlers can optimize and initialize predictably.
How to Debug It
- •
Time each phase separately
- •Log timestamps before graph creation, after
compile(), beforeinvoke(), and afterinvoke(). - •If
compile()dominates, you found the issue.
- •Log timestamps before graph creation, after
- •
Check whether code runs per request
- •Add a module-level counter.
- •If it increments on every API call, your app is rebuilding state in the handler.
- •
Inspect deployment behavior
- •If local dev is fast but production is slow only on first hit, this is likely platform cold start.
- •Check whether your host scales to zero.
- •
Look for repeated initialization logs
- •Repeated lines like:
- •
Creating ChatOpenAI client - •
Compiling StateGraph - •
Loading prompt file
- •
- •These should appear once per process, not once per request.
- •Repeated lines like:
Prevention
- •Compile your LangGraph app once at module scope and reuse the exported instance.
- •Keep expensive setup outside nodes:
- •model clients
- •retrievers
- •prompt files
- •vector stores
- •Add a startup health check that hits the graph once after deploy so you catch cold paths before users do.
- •If you deploy serverless, set minimum instances or accept that first-request latency will exist and design around it.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit