How to Fix 'cold start latency when scaling' in LangGraph (Python)

By Cyprian AaronsUpdated 2026-04-22

cold-start-latency-when-scalinglanggraphpython

What this error usually means

cold start latency when scaling in LangGraph usually shows up when a graph worker has to initialize too much work on the first request after a scale-up. In practice, that means your pod, process, or serverless instance is spending time loading models, building graph objects, opening DB connections, or compiling schemas before it can answer.

You’ll see this most often in Kubernetes, autoscaled API workers, Lambda-style deployments, or any setup where LangGraph runs behind a process manager that spins up new replicas on demand.

The Most Common Cause

The #1 cause is building heavy objects inside the request path instead of at module load time or in a long-lived startup hook. In LangGraph apps, this usually means creating the StateGraph, LLM client, vector store, retriever, or database connection inside the handler that executes per request.

That pattern works locally. Under scale-out, it creates repeated cold starts because every new worker repeats the same expensive initialization.

Broken vs fixed pattern

Broken pattern	Fixed pattern
Builds graph and clients on every request	Builds once and reuses across requests
High first-request latency after scaling	Lower warm-path latency
Harder to observe and cache	Easier to instrument and stabilize

# BROKEN: expensive setup happens inside the request handler
from langgraph.graph import StateGraph
from langchain_openai import ChatOpenAI

def handle_request(user_input: str):
    llm = ChatOpenAI(model="gpt-4o-mini")  # created every time
    graph = StateGraph(dict)              # created every time

    def call_model(state):
        return {"answer": llm.invoke(state["input"]).content}

    graph.add_node("call_model", call_model)
    graph.set_entry_point("call_model")
    app = graph.compile()

    return app.invoke({"input": user_input})

# FIXED: build once at import/startup time
from langgraph.graph import StateGraph
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

def call_model(state):
    return {"answer": llm.invoke(state["input"]).content}

graph = StateGraph(dict)
graph.add_node("call_model", call_model)
graph.set_entry_point("call_model")
app = graph.compile()

def handle_request(user_input: str):
    return app.invoke({"input": user_input})

If you’re using FastAPI, put the compiled app in startup state or module scope. If you’re using workers like Gunicorn/Uvicorn, make sure each worker initializes once, not per endpoint call.

Other Possible Causes

1) Lazy loading a model or embedding index on first node execution

If your first LangGraph node loads a model from disk or initializes an embedding index, scale-out will expose that latency immediately.

# expensive lazy init inside node
def retrieve(state):
    index = load_faiss_index("/mnt/index.faiss")
    return {"docs": index.search(state["query"])}

Fix it by loading once:

index = load_faiss_index("/mnt/index.faiss")

def retrieve(state):
    return {"docs": index.search(state["query"])}

2) Recompiling the graph repeatedly

StateGraph.compile() is not something you want in the hot path. If you compile per request or per tenant without caching, every new replica pays that cost again.

# bad
def get_app():
    graph = build_graph()
    return graph.compile()

Use one compiled instance per process:

# good
app = build_graph().compile()

If you need tenant-specific behavior, cache compiled graphs by tenant key instead of rebuilding them blindly.

3) Slow startup dependencies: DB pools, secrets fetches, remote config

A worker may look like it has “cold start latency” when the real issue is initialization blocked on Postgres, Redis, Vault, S3 config files, or secret managers.

# startup path blocked by remote calls
settings = fetch_remote_settings()
db = create_engine(settings.db_url)
redis = Redis.from_url(settings.redis_url)

Move those calls into startup hooks and add timeouts. If your platform supports prewarming, use it.

4) Too much work in `init` for custom nodes/tools

Custom node classes sometimes hide expensive setup in constructors. That makes scaling painful because every worker recreates them.

class MyTool:
    def __init__(self):
        self.client = build_huge_client()
        self.cache = load_cache_from_disk()

Prefer dependency injection:

class MyTool:
    def __init__(self, client, cache):
        self.client = client
        self.cache = cache

Then create those dependencies once during app startup.

How to Debug It

•
Measure startup vs request latency separately
- •Add timing around app creation and around app.invoke().
- •If compile() or dependency setup is slow, you’ve found the bottleneck.
•
Check whether latency spikes only on new replicas
- •If only the first request after autoscaling is slow, this is a cold-start problem.
- •If every request is slow, look at node logic or external services instead.
•
Log initialization points
- •Add logs around model creation, DB connection setup, retriever loading, and StateGraph.compile().
- •You want to know exactly which line runs during scale-up.
•
Profile one worker from boot to first response
- •Use py-spy, cProfile, or simple wall-clock logging.
- •Focus on imports and constructors before chasing LangGraph internals.

Prevention

•Build graphs and clients at module scope or in explicit startup hooks.
•Cache compiled graphs and heavy resources per process.
•Keep node functions thin: no model loading, no file I/O, no network setup inside the hot path.
•Add startup timing metrics so you catch regressions before autoscaling exposes them.

If you want one rule to remember: in LangGraph Python apps, treat compile() and client initialization as deployment-time work, not request-time work. That’s usually enough to eliminate cold start latency when scaling.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit