How to Fix 'cold start latency when scaling' in LangGraph (TypeScript)

By Cyprian AaronsUpdated 2026-04-22
cold-start-latency-when-scalinglanggraphtypescript

When you see cold start latency when scaling in a LangGraph TypeScript app, it usually means your graph is doing too much work on the first request after a new instance comes up. In practice, this shows up when autoscaling adds a fresh pod/container and that instance has to load models, compile graphs, create clients, or warm caches before it can serve traffic.

The fix is usually not in LangGraph itself. It’s almost always about how you initialize your graph, dependencies, and runtime state.

The Most Common Cause

The #1 cause is doing heavy initialization inside the request path instead of once at process startup.

That includes:

  • creating ChatOpenAI or other model clients per request
  • rebuilding the graph on every call
  • loading prompts, tools, or vector indexes lazily
  • opening DB connections inside node functions

Here’s the broken pattern:

// broken.ts
import { StateGraph } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";

export async function handleRequest(input: string) {
  const llm = new ChatOpenAI({
    model: "gpt-4o-mini",
    apiKey: process.env.OPENAI_API_KEY,
  });

  const graph = new StateGraph({
    channels: {
      input: null,
      output: null,
    },
  });

  // expensive setup repeated on every request
  const compiled = graph.compile();

  return compiled.invoke({ input });
}

This works locally, then falls over under scale because every cold instance pays the full setup cost before serving the first request.

Use this instead:

// fixed.ts
import { StateGraph } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

// build once at module load
const graph = new StateGraph({
  channels: {
    input: null,
    output: null,
  },
});

const compiledGraph = graph.compile();

export async function handleRequest(input: string) {
  return compiledGraph.invoke({ input });
}

If you need async setup, do it once during bootstrap and block readiness until it finishes.

PatternResult
Build graph/client inside handlerHigh cold start latency
Build once at module scopeLower first-request latency
Lazy-load everything on demandUnpredictable spikes
Preload during startupStable scaling behavior

Other Possible Causes

1. Your node functions are doing I/O on first execution

A common mistake is hiding expensive setup inside a LangGraph node.

const fetchDocsNode = async () => {
  const index = await loadVectorIndex(); // expensive every cold start
  return index.search("policy");
};

Move that work out of the node and cache it at module scope or startup.

const indexPromise = loadVectorIndex();

const fetchDocsNode = async () => {
  const index = await indexPromise;
  return index.search("policy");
};

2. You are creating a new LLM client per invocation

This is especially bad in serverless or autoscaled containers.

export const callModel = async (input: string) => {
  const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
  return llm.invoke(input);
};

Fix it by reusing one client instance.

const llm = new ChatOpenAI({ model: "gpt-4o-mini" });

export const callModel = async (input: string) => {
  return llm.invoke(input);
};

3. Your deployment does not keep instances warm

If you run on serverless infrastructure, scale-to-zero will always produce cold starts unless you configure minimum replicas or provisioned concurrency.

# Kubernetes example
spec:
  replicas: 1

For production traffic, set a floor:

spec:
  minReplicas: 2

If you’re on AWS Lambda, use provisioned concurrency. If you’re on Cloud Run, set minimum instances.

4. You compile graphs repeatedly in tests or middleware

I’ve seen this in Next.js route handlers and Express middleware.

app.post("/chat", async (req, res) => {
  const graph = buildGraph();
  const appCompiled = graph.compile();
  res.json(await appCompiled.invoke(req.body));
});

Compile once and reuse:

const appCompiled = buildGraph().compile();

app.post("/chat", async (req, res) => {
  res.json(await appCompiled.invoke(req.body));
});

How to Debug It

  1. Measure startup time vs request time

    • Add logs around process boot and first invoke.
    • If compile(), client creation, or index loading happens after the request enters the handler, that’s your problem.
  2. Check whether cold instances are slower than warm ones

    • Send one request after deploy.
    • Then send five more.
    • If only the first one is slow, you have a startup/warmup issue rather than a LangGraph execution bug like InvalidUpdateError or EmptyChannelError.
  3. Instrument each node

    • Log duration per node.
    • Look for nodes that spike only on first run.
const timedNode = async (state: any) => {
  const start = Date.now();
  const result = await doWork(state);
  console.log("node_ms", Date.now() - start);
  return result;
};
  1. Inspect deployment scaling behavior
    • Check whether pods are being created frequently.
    • If autoscaling keeps adding fresh instances, your “latency problem” may be infrastructure churn, not LangGraph logic.

Prevention

  • Build graphs and clients once at process startup.
  • Keep expensive I/O out of node functions; preload caches and indexes before serving traffic.
  • Set minimum replicas or provisioned concurrency so traffic doesn’t hit brand-new instances constantly.
  • Add timing logs around compile(), model initialization, and first-node execution so regressions show up immediately.

If you’re seeing cold start latency when scaling, treat it as an initialization problem first. In LangGraph TypeScript apps, the fastest fix is usually to stop rebuilding everything per request and make startup explicit.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides