How to Fix 'cold start latency during development' in LangChain (TypeScript)
When you see cold start latency during development in a LangChain TypeScript app, it usually means your chain or model is being initialized too late, too often, or inside a hot path. In practice, this shows up during local dev, serverless emulation, or API routes that re-create clients on every request.
The fix is usually not in LangChain itself. It’s in how you instantiate ChatOpenAI, RunnableSequence, or any custom wrapper around them.
The Most Common Cause
The #1 cause is creating the LLM client inside the request handler or function body instead of reusing a module-level instance.
That pattern forces repeated setup work:
- •loading env vars
- •constructing HTTP clients
- •warming up model wrappers
- •rebuilding chains on every request
Broken vs fixed
| Broken pattern | Fixed pattern |
|---|---|
Instantiates ChatOpenAI on every request | Creates one shared instance at module scope |
Rebuilds RunnableSequence repeatedly | Builds the chain once and reuses it |
| Causes “cold start” behavior in dev and serverless | Reduces startup overhead immediately |
// broken.ts
import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";
import { RunnableSequence } from "@langchain/core/runnables";
export async function POST(req: Request) {
const { question } = await req.json();
// Wrong: created on every request
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
});
const prompt = PromptTemplate.fromTemplate("Answer this: {question}");
// Wrong: rebuilt on every request
const chain = RunnableSequence.from([prompt, llm]);
const result = await chain.invoke({ question });
return Response.json({ result });
}
// fixed.ts
import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";
import { RunnableSequence } from "@langchain/core/runnables";
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
});
const prompt = PromptTemplate.fromTemplate("Answer this: {question}");
const chain = RunnableSequence.from([prompt, llm]);
export async function POST(req: Request) {
const { question } = await req.json();
const result = await chain.invoke({ question });
return Response.json({ result });
}
If you’re using Next.js route handlers, Express middleware, or a local dev server with HMR, this matters even more. Every rebuild can trigger fresh initialization if you keep construction inside the handler.
Other Possible Causes
1) You are importing heavy side effects at startup
Some files do too much work during import time. If your chain file imports database clients, vector stores, or filesystem scans, your “cold start” becomes your app boot time.
// bad
import "./load-docs"; // scans disk immediately
import "./connect-pinecone"; // connects immediately
Fix by moving setup behind explicit init functions.
let initialized = false;
export async function init() {
if (initialized) return;
initialized = true;
// connect here
}
2) You are recreating embeddings/vector stores per request
This is common when people build retrieval chains in API handlers. If you see new OpenAIEmbeddings() or new MemoryVectorStore() inside the route, that’s a red flag.
// bad
export async function POST() {
const embeddings = new OpenAIEmbeddings();
const store = await MemoryVectorStore.fromDocuments(docs, embeddings);
}
Move store creation to startup or reuse a cached singleton.
// good
const embeddings = new OpenAIEmbeddings();
const storePromise = MemoryVectorStore.fromDocuments(docs, embeddings);
3) Your dev server is restarting too often
If you’re using nodemon, tsx watch, Next.js dev mode, or Docker bind mounts, file changes can restart the process constantly. That makes every request feel like a cold start.
Check for:
- •large watched directories
- •generated files in the repo root
- •Docker volume churn
Example nodemon.json:
{
"watch": ["src"],
"ignore": ["dist", "node_modules", ".next"]
}
4) You are using streaming without pre-initializing the model
Streaming can make startup issues more visible because the first token waits on client setup. If your first call is slow but subsequent calls are fine, initialization is likely happening too late.
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
streaming: true,
});
Create it once and reuse it across requests. Don’t wrap it in per-request factory code unless you have a very specific reason.
How to Debug It
- •
Log timestamps around initialization
- •Measure how long
new ChatOpenAI(...), prompt construction, and chain creation take. - •If those logs appear on every request, you found the problem.
console.time("llm-init"); const llm = new ChatOpenAI({ model: "gpt-4o-mini" }); console.timeEnd("llm-init"); - •Measure how long
- •
Check whether module scope runs once or many times
- •Add a top-level log in the file where your chain is defined.
- •If it prints repeatedly during normal requests, your dev server is reloading the module.
console.log("chain module loaded"); - •
Search for repeated constructors
- •Look for
new ChatOpenAI,new OpenAIEmbeddings,RunnableSequence.from, and vector store creation inside handlers. - •Anything inside
export async function GET/POSTis suspect.
- •Look for
- •
Temporarily strip the app down
- •Remove retrieval, tools, memory, and streaming.
- •Keep only one prompt + one LLM call.
- •If latency disappears, add components back one by one until it returns.
Prevention
- •Create LangChain clients and chains at module scope when they are safe to share.
- •Keep heavy work out of request handlers: embeddings generation, vector store setup, file scanning.
- •Add startup timing logs in dev so regressions show up immediately.
If you’re seeing this error in TypeScript LangChain code, assume it’s an architecture issue first and an LLM issue second. In most cases, moving initialization out of the hot path fixes it fast.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit