How to Fix 'authentication failed when scaling' in LangGraph (TypeScript)
If you see authentication failed when scaling in LangGraph, the runtime is trying to start or expand a worker, but the auth context it needs is missing or invalid. In practice, this usually shows up when a graph works locally, then fails as soon as you deploy, autoscale, or run it through a remote executor.
The error is almost never “LangGraph is broken.” It usually means your API key, service token, or environment wiring is wrong in the path that only runs during scale-out.
The Most Common Cause
The #1 cause is your auth token is available in the main process, but not inside the scaled worker process.
This happens a lot in TypeScript when people read process.env at startup, create a client once, and assume the same auth context will exist everywhere. When LangGraph spins up another execution path, that token is missing or stale.
Broken vs fixed
| Broken pattern | Fixed pattern |
|---|---|
| Client created once with implicit env access | Client created per request or injected explicitly |
| Token only exists in local process | Token passed into the graph/runtime config |
| Works in dev, fails on scale-out | Works consistently across workers |
// ❌ Broken
import { ChatOpenAI } from "@langchain/openai";
import { StateGraph } from "@langchain/langgraph";
const llm = new ChatOpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const graph = new StateGraph()
.addNode("generate", async () => {
const res = await llm.invoke("Hello");
return { text: res.content };
})
.compile();
// ✅ Fixed
import { ChatOpenAI } from "@langchain/openai";
import { StateGraph } from "@langchain/langgraph";
function makeLLM(apiKey: string) {
return new ChatOpenAI({ apiKey });
}
const graph = new StateGraph()
.addNode("generate", async (state: { apiKey: string }) => {
const llm = makeLLM(state.apiKey);
const res = await llm.invoke("Hello");
return { text: res.content };
})
.compile();
If you’re using LangGraph Platform or any remote runtime, the same rule applies: don’t rely on ambient env state inside the worker. Pass credentials through the supported config path for that runtime.
Other Possible Causes
1) Wrong environment variable name
A typo like OPENAI_APIKEY instead of OPENAI_API_KEY gives you an empty key at runtime.
const key = process.env.OPENAI_APIKEY; // ❌ typo
const key = process.env.OPENAI_API_KEY; // ✅ correct
if (!key) throw new Error("Missing OPENAI_API_KEY");
2) Secret not mounted in the scaled deployment
Your local .env works, but Kubernetes, ECS, or your serverless worker never got the secret.
# ❌ Missing env injection
containers:
- name: worker
image: my-langgraph-worker:latest
# ✅ Secret injected into worker
containers:
- name: worker
image: my-langgraph-worker:latest
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: langgraph-secrets
key: openai_api_key
3) Token expires during long-running runs
If your graph scales after a delay and uses short-lived credentials, you’ll get auth failures only on expansion.
// ❌ Reusing an expiring token across long jobs
const token = await getShortLivedToken();
Fix it by refreshing before each call:
// ✅ Refresh per execution
const token = await getFreshToken();
await runGraph({ token });
4) Using one client instance across tenants
If your app serves multiple customers and reuses one authenticated client globally, the wrong tenant token can be sent during scale-out.
// ❌ Global singleton with tenant-specific auth baked in
export const client = new MyClient({ apiKey: tenantApiKey });
// ✅ Build per-request client from request context
export function makeClient(apiKey: string) {
return new MyClient({ apiKey });
}
How to Debug It
- •
Print the exact auth source at startup and inside the node
- •Log whether
process.env.OPENAI_API_KEYexists. - •Log whether your node receives an explicit token in state/config.
- •If startup has it and the node doesn’t, you found the gap.
- •Log whether
- •
Check where scaling happens
- •Local
compile().invoke()is not the same as a remote executor. - •If this only fails under load or on deployment, inspect worker environment variables first.
- •Look for messages like
authentication failed when scalingalongside provider errors such as401 Unauthorized.
- •Local
- •
Verify the provider-specific error
- •OpenAI often returns
401 Incorrect API key provided. - •Anthropic may return
authentication_error. - •LangGraph is usually wrapping a lower-level auth failure from the model provider or remote runtime.
- •OpenAI often returns
- •
Test with a hardcoded known-good token path
- •Temporarily inject a valid token directly into one node.
- •If that works, your problem is not LangGraph execution; it’s credential propagation.
- •Remove it after testing and move back to secret injection.
Prevention
- •Pass auth explicitly into graph execution
- •Don’t depend on ambient globals in workers.
- •Validate required secrets at process boot
- •Fail fast with a clear message if
OPENAI_API_KEY,ANTHROPIC_API_KEY, or your platform token is missing.
- •Fail fast with a clear message if
- •Use per-request client construction for multi-tenant systems
- •Avoid shared authenticated singletons when requests can scale independently.
If you want a quick rule of thumb: when LangGraph scales and auth breaks, assume credential propagation before anything else. In TypeScript apps, that’s usually where the bug lives.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit