How to Fix 'cold start latency during development' in LangGraph (Python)
What this error usually means
If you’re seeing cold start latency during development in a LangGraph Python app, the runtime is telling you that your graph is paying the full initialization cost on every request or every code reload. In practice, this shows up during local dev with langgraph dev, FastAPI reloads, notebook restarts, or when you rebuild the graph inside a request handler.
The symptom is usually not a crash. It’s a slow first response, repeated model/client initialization, and logs that look like your graph is being recompiled over and over.
The Most Common Cause
The #1 cause is building the graph inside a function that runs per request instead of once at module startup.
That means your StateGraph, LLM client, vector store, or tool setup gets recreated on every call. In LangGraph, that turns a cheap invoke into a cold start factory.
Broken pattern vs fixed pattern
| Broken | Fixed |
|---|---|
| Build graph inside endpoint | Build graph once at import time |
| Recreate clients on every call | Reuse initialized clients |
| Slow first token on each request | Warm graph stays in memory |
# broken.py
from fastapi import FastAPI
from langgraph.graph import StateGraph, START, END
app = FastAPI()
def build_graph():
graph = StateGraph(dict)
# expensive setup happens here every request
return graph.compile()
@app.post("/chat")
def chat(payload: dict):
app_graph = build_graph()
return app_graph.invoke(payload)
# fixed.py
from fastapi import FastAPI
from langgraph.graph import StateGraph, START, END
app = FastAPI()
def build_graph():
graph = StateGraph(dict)
# build once
return graph.compile()
app_graph = build_graph()
@app.post("/chat")
def chat(payload: dict):
return app_graph.invoke(payload)
If you’re using langgraph dev, the same rule applies. Keep expensive objects at module scope so the dev server can reuse them across requests instead of rebuilding them for every hot reload.
Other Possible Causes
1) Your model client is recreated inside the node
This is common with ChatOpenAI, Anthropic clients, or custom HTTP wrappers.
# bad
def agent_node(state):
llm = ChatOpenAI(model="gpt-4o-mini")
return {"messages": [llm.invoke(state["messages"])]}
# good
llm = ChatOpenAI(model="gpt-4o-mini")
def agent_node(state):
return {"messages": [llm.invoke(state["messages"])]}
If the constructor does network setup, auth loading, or TLS warmup, you’ll feel it on every node execution.
2) You’re reloading too aggressively in development
FastAPI --reload, watchfiles, and notebook edits can trigger full process restarts. That looks like “cold start latency” because it is one.
uvicorn app:app --reload --reload-dir .
If your project has large imports or heavy startup code, every file save becomes an expensive restart. Narrow the watched directories or disable reload temporarily to confirm.
3) Your node does blocking work before returning control
LangGraph nodes should do actual task work. Don’t hide startup logic inside them.
# bad
def retrieve_node(state):
index = load_index_from_disk() # expensive every call
docs = index.search(state["query"])
return {"docs": docs}
# good
index = load_index_from_disk()
def retrieve_node(state):
docs = index.search(state["query"])
return {"docs": docs}
If loading the index takes 2 seconds, your “graph latency” is really storage initialization latency.
4) You are compiling multiple graphs from the same source file
This happens when each route or worker imports and compiles its own copy.
# bad
from mygraph import build_graph
@app.post("/a")
def route_a(payload):
return build_graph().invoke(payload)
@app.post("/b")
def route_b(payload):
return build_graph().invoke(payload)
# good
from mygraph import compiled_graph
@app.post("/a")
def route_a(payload):
return compiled_graph.invoke(payload)
@app.post("/b")
def route_b(payload):
return compiled_graph.invoke(payload)
In LangGraph terms, compile once and share the compiled artifact. Don’t treat compilation as part of request handling.
How to Debug It
- •
Time your startup separately from your request path
- •Add logs around graph construction.
- •If
StateGraph(...).compile()shows up during each request, you found the issue.
- •
Check whether your node constructors run more than once
- •Print object IDs for
llm, retriever, or vector store clients. - •If they change per request, they’re being recreated.
- •Print object IDs for
- •
Disable auto-reload and compare latency
- •Run without
--reload. - •If latency drops sharply, your problem is dev server restarts rather than LangGraph itself.
- •Run without
- •
Inspect LangGraph execution logs
- •Look for repeated initialization patterns around
CompiledStateGraph. - •If you see the same setup messages before each invoke, move that code out of nodes and handlers.
- •Look for repeated initialization patterns around
Prevention
- •Build graphs at module scope and reuse the compiled object.
- •Initialize LLM clients, retrievers, and indexes once during app startup.
- •Keep hot-reload scopes small so development restarts don’t rebuild everything unnecessarily.
- •Treat any expensive I/O inside a node as a bug unless it’s part of the actual workflow.
If you follow one rule from this article: compile once, invoke many. That fixes most “cold start latency during development” complaints in LangGraph Python apps before they turn into a bigger performance problem in staging.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit