How to Fix 'cold start latency' in LangGraph (TypeScript)

By Cyprian AaronsUpdated 2026-04-22
cold-start-latencylanggraphtypescript

What “cold start latency” usually means

In LangGraph, cold start latency is not a TypeScript compiler error. It usually shows up as a runtime performance problem: the first request is slow because the graph, model client, vector store, or serverless runtime is initializing from scratch.

You’ll see this most often in Lambda, Vercel functions, Cloudflare Workers, or any setup where your LangGraph app is instantiated inside the request handler instead of at module scope.

The Most Common Cause

The #1 cause is recreating the graph and all its dependencies on every request.

If you build the StateGraph, compile it, and create the model client inside your handler, every invocation pays the startup cost again. That’s the classic cold start pattern.

Broken vs fixed

Broken patternFixed pattern
Graph is created inside the request pathGraph is created once at module load
Model client re-instantiated per requestReuse singleton client
Compiled graph rebuilt every timeCompile once and reuse
// broken.ts
import { StateGraph } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";

export async function POST(req: Request) {
  const llm = new ChatOpenAI({ model: "gpt-4o-mini" });

  const graph = new StateGraph({
    channels: {
      messages: { value: (x, y) => x.concat(y), default: () => [] },
    },
  });

  graph.addNode("assistant", async (state) => {
    const response = await llm.invoke(state.messages);
    return { messages: [response] };
  });

  graph.addEdge("__start__", "assistant");
  graph.addEdge("assistant", "__end__");

  const app = graph.compile();

  const result = await app.invoke({
    messages: [{ role: "user", content: "Hello" }],
  });

  return Response.json(result);
}
// fixed.ts
import { StateGraph } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({ model: "gpt-4o-mini" });

const graph = new StateGraph({
  channels: {
    messages: { value: (x, y) => x.concat(y), default: () => [] },
  },
});

graph.addNode("assistant", async (state) => {
  const response = await llm.invoke(state.messages);
  return { messages: [response] };
});

graph.addEdge("__start__", "assistant");
graph.addEdge("assistant", "__end__");

const app = graph.compile();

export async function POST(req: Request) {
  const result = await app.invoke({
    messages: [{ role: "user", content: "Hello" }],
  });

  return Response.json(result);
}

The fix is simple: move expensive initialization out of the handler. In serverless apps, module scope persists across warm invocations, so you only pay that cost once per container.

Other Possible Causes

1. You are loading large prompts or schemas at runtime

If your prompt templates or Zod schemas are built dynamically from disk or remote config on every call, startup latency spikes fast.

// bad
export async function POST() {
  const prompt = await fs.readFile("./prompt.txt", "utf8");
  const schema = z.object(JSON.parse(await fs.readFile("./schema.json", "utf8")));
}
// better
const prompt = await fs.readFile("./prompt.txt", "utf8");
const schema = z.object(JSON.parse(await fs.readFile("./schema.json", "utf8")));

export async function POST() {
  // reuse prompt and schema
}

2. Your node does synchronous work before the first LLM call

A common LangGraph anti-pattern is doing CPU-heavy parsing, embedding generation, or database warmup inside a node before any useful output happens.

graph.addNode("retrieve", async () => {
  const embeddings = await embedDocuments(bigCorpus); // expensive
  return { embeddings };
});

Move that work to ingestion time or cache it outside the graph. If it must happen in runtime, memoize it and make sure it does not run on every request.

3. You are using a fresh database/vector store connection per invocation

Creating a new PrismaClient, pg pool, Pinecone client, or Chroma connection inside the handler adds avoidable latency.

// bad
export async function POST() {
  const db = new PrismaClient();
  const vectorStore = new PineconeStore({ /* ... */ });
}
// good
const db = new PrismaClient();
const vectorStore = new PineconeStore({ /* ... */ });

export async function POST() {
  // reuse clients
}

If you see logs like PrismaClientInitializationError or long pauses before your first app.invoke, this is often the reason.

4. Your serverless bundle is too large

LangGraph itself is not usually the problem. The problem is shipping a huge bundle with unused SDKs, local assets, and heavy transitive dependencies.

A bloated bundle increases cold start time even before your code runs.

{
  "bundle": true,
  "external": ["@prisma/client"],
  "minify": true,
  "treeShaking": true
}

Also check for accidental imports like:

  • full AWS SDK v2 instead of modular v3 clients
  • entire utility libraries when you only need one function
  • local JSON fixtures imported into production code

How to Debug It

  1. Time each stage separately Add timestamps around graph construction, client init, and app.invoke.

    console.time("init-llm");
    const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
    console.timeEnd("init-llm");
    
    console.time("compile-graph");
    const app = graph.compile();
    console.timeEnd("compile-graph");
    
    console.time("invoke");
    await app.invoke(input);
    console.timeEnd("invoke");
    
  2. Check whether the slowdown happens only on first request If request #1 takes 5–10 seconds and request #2 takes under a second, you are looking at a cold start issue, not a LangGraph execution bug.

  3. Inspect logs for repeated initialization Look for repeated messages like:

    • Initializing ChatOpenAI
    • Connecting to Postgres
    • Loading prompt template
    • Compiling StateGraph

    If those appear on every request, move them out of the handler.

  4. Profile bundle size and deployment target In Next.js/Vercel/Lambda setups, check whether your route runs in an edge runtime or Node runtime. Some LangChain/LangGraph dependencies are heavier in edge environments and can trigger slow starts or incompatibilities.

Prevention

  • Build graphs at module scope and export a compiled singleton.
  • Keep model clients, DB pools, and vector stores outside request handlers.
  • Cache static prompts, schemas, and retrievers instead of rebuilding them per invocation.
  • Measure init time separately from invoke time before shipping to production.

If you want one rule to remember: don’t compile LangGraph inside the hot path. That pattern turns every request into a cold start.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides