How to Fix 'cold start latency' in LangGraph (Python)
What “cold start latency” means in LangGraph
In LangGraph, “cold start latency” usually means your first request is slow because the graph, model client, or external dependencies are being initialized on demand. It typically shows up in serverless deployments, API workers that restart often, or apps that build the graph inside the request path.
The symptom is simple: the first call takes seconds, later calls are fast. In logs you’ll usually see your app sitting on graph.invoke(...), graph.stream(...), or a model call like ChatOpenAI.invoke() while imports, DB connections, or compiled graphs are happening lazily.
The Most Common Cause
The #1 cause is building the LangGraph object inside the request handler instead of creating it once and reusing it.
That pattern forces a full rebuild on every cold process start and often on every request in short-lived workers. If you’re using StateGraph, CompiledStateGraph, or a chain of node factories, this adds avoidable startup cost.
Broken vs fixed pattern
| Broken pattern | Fixed pattern |
|---|---|
| Build graph per request | Build once at module load or app startup |
| Recreate model client repeatedly | Reuse shared client |
| Compile graph repeatedly | Compile once |
# broken.py
from fastapi import FastAPI
from langgraph.graph import StateGraph, START, END
from langchain_openai import ChatOpenAI
app = FastAPI()
def build_graph():
llm = ChatOpenAI(model="gpt-4o-mini") # recreated every request
builder = StateGraph(dict)
def node(state):
return {"answer": llm.invoke(state["question"]).content}
builder.add_node("answer", node)
builder.add_edge(START, "answer")
builder.add_edge("answer", END)
return builder.compile()
@app.post("/ask")
def ask(payload: dict):
graph = build_graph() # expensive cold start every call
return graph.invoke({"question": payload["question"]})
# fixed.py
from fastapi import FastAPI
from langgraph.graph import StateGraph, START, END
from langchain_openai import ChatOpenAI
app = FastAPI()
llm = ChatOpenAI(model="gpt-4o-mini")
def build_graph():
builder = StateGraph(dict)
def node(state):
return {"answer": llm.invoke(state["question"]).content}
builder.add_node("answer", node)
builder.add_edge(START, "answer")
builder.add_edge("answer", END)
return builder.compile()
graph = build_graph() # compile once
@app.post("/ask")
def ask(payload: dict):
return graph.invoke({"question": payload["question"]})
If you’re deploying with Uvicorn workers or serverless functions, this difference matters a lot. A compiled LangGraph should be treated like application state, not request state.
Other Possible Causes
1) Heavy imports at module load
If your graph file imports large ML packages, vector DB clients, or OCR libraries up front, startup gets slower before the first request even lands.
# slow startup
import torch
import transformers
import pandas as pd
Move nonessential imports into the node that needs them:
def node(state):
import pandas as pd
# only load when needed
2) Network calls during graph construction
This happens when you fetch prompts, schema metadata, secrets, or remote config while building nodes.
def build_graph():
prompt = requests.get("https://config.internal/prompt").text # bad
Fix it by loading config before app startup or caching it:
from functools import lru_cache
@lru_cache(maxsize=1)
def load_prompt():
return requests.get("https://config.internal/prompt").text
3) Recreating clients and connections in nodes
If each node opens a new DB connection or LLM client, latency spikes on first execution and can stay high under load.
def node(state):
db = create_engine(DB_URL) # bad: per invocation
Use singleton-style resources:
engine = create_engine(DB_URL)
def node(state):
with engine.connect() as conn:
...
4) Compiled graph is fine, but checkpointing/storage is slow
If you use SqliteSaver, Postgres checkpointing, Redis-backed memory, or remote persistence, the “cold start” may actually be storage initialization.
from langgraph.checkpoint.sqlite import SqliteSaver
checkpointer = SqliteSaver.from_conn_string("checkpoints.db")
graph = builder.compile(checkpointer=checkpointer)
If the DB file is on network storage or the table doesn’t exist yet, initialization can be slow. Pre-create storage and test connection time separately.
How to Debug It
- •
Time graph construction separately from invocation
Add timers aroundbuild_graph()andgraph.invoke(...). If compile time is high, the issue is initialization. If invoke time is high only on first call, it’s lazy loading inside nodes. - •
Log each expensive dependency
Measure imports, model creation, DB connection setup, and config fetches individually. You want to know whether the delay comes fromChatOpenAI,StateGraph.compile(), or your own code. - •
Check whether your app rebuilds workers often
In FastAPI + Uvicorn/Gunicorn or serverless runtimes like Lambda/Cloud Run jobs, cold starts happen when processes recycle. If every request hits a new worker, fix deployment behavior before touching LangGraph code. - •
Inspect LangGraph execution traces
Use LangSmith or structured logging around nodes. If the delay occurs before the first node runs, it’s construction/init time. If it occurs inside one node consistently, that node has the problem.
Prevention
- •Build graphs at startup and reuse compiled
CompiledStateGraphinstances across requests. - •Keep heavy I/O out of graph construction; load configs and secrets before serving traffic.
- •Cache clients and connections at module scope unless they are explicitly short-lived.
- •Put timers around
compile()and each node so cold-start regressions show up in CI before production.
If you want a simple rule: compile once, connect once, invoke many times. That’s how you keep LangGraph latency predictable in Python services.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit