How to Fix 'authentication failed when scaling' in LangGraph (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
authentication-failed-when-scalinglanggraphtypescript

If you see authentication failed when scaling in LangGraph, the runtime is trying to start or expand a worker, but the auth context it needs is missing or invalid. In practice, this usually shows up when a graph works locally, then fails as soon as you deploy, autoscale, or run it through a remote executor.

The error is almost never “LangGraph is broken.” It usually means your API key, service token, or environment wiring is wrong in the path that only runs during scale-out.

The Most Common Cause

The #1 cause is your auth token is available in the main process, but not inside the scaled worker process.

This happens a lot in TypeScript when people read process.env at startup, create a client once, and assume the same auth context will exist everywhere. When LangGraph spins up another execution path, that token is missing or stale.

Broken vs fixed

Broken patternFixed pattern
Client created once with implicit env accessClient created per request or injected explicitly
Token only exists in local processToken passed into the graph/runtime config
Works in dev, fails on scale-outWorks consistently across workers
// ❌ Broken
import { ChatOpenAI } from "@langchain/openai";
import { StateGraph } from "@langchain/langgraph";

const llm = new ChatOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const graph = new StateGraph()
  .addNode("generate", async () => {
    const res = await llm.invoke("Hello");
    return { text: res.content };
  })
  .compile();
// ✅ Fixed
import { ChatOpenAI } from "@langchain/openai";
import { StateGraph } from "@langchain/langgraph";

function makeLLM(apiKey: string) {
  return new ChatOpenAI({ apiKey });
}

const graph = new StateGraph()
  .addNode("generate", async (state: { apiKey: string }) => {
    const llm = makeLLM(state.apiKey);
    const res = await llm.invoke("Hello");
    return { text: res.content };
  })
  .compile();

If you’re using LangGraph Platform or any remote runtime, the same rule applies: don’t rely on ambient env state inside the worker. Pass credentials through the supported config path for that runtime.

Other Possible Causes

1) Wrong environment variable name

A typo like OPENAI_APIKEY instead of OPENAI_API_KEY gives you an empty key at runtime.

const key = process.env.OPENAI_APIKEY; // ❌ typo
const key = process.env.OPENAI_API_KEY; // ✅ correct
if (!key) throw new Error("Missing OPENAI_API_KEY");

2) Secret not mounted in the scaled deployment

Your local .env works, but Kubernetes, ECS, or your serverless worker never got the secret.

# ❌ Missing env injection
containers:
  - name: worker
    image: my-langgraph-worker:latest
# ✅ Secret injected into worker
containers:
  - name: worker
    image: my-langgraph-worker:latest
    env:
      - name: OPENAI_API_KEY
        valueFrom:
          secretKeyRef:
            name: langgraph-secrets
            key: openai_api_key

3) Token expires during long-running runs

If your graph scales after a delay and uses short-lived credentials, you’ll get auth failures only on expansion.

// ❌ Reusing an expiring token across long jobs
const token = await getShortLivedToken();

Fix it by refreshing before each call:

// ✅ Refresh per execution
const token = await getFreshToken();
await runGraph({ token });

4) Using one client instance across tenants

If your app serves multiple customers and reuses one authenticated client globally, the wrong tenant token can be sent during scale-out.

// ❌ Global singleton with tenant-specific auth baked in
export const client = new MyClient({ apiKey: tenantApiKey });
// ✅ Build per-request client from request context
export function makeClient(apiKey: string) {
  return new MyClient({ apiKey });
}

How to Debug It

  1. Print the exact auth source at startup and inside the node

    • Log whether process.env.OPENAI_API_KEY exists.
    • Log whether your node receives an explicit token in state/config.
    • If startup has it and the node doesn’t, you found the gap.
  2. Check where scaling happens

    • Local compile().invoke() is not the same as a remote executor.
    • If this only fails under load or on deployment, inspect worker environment variables first.
    • Look for messages like authentication failed when scaling alongside provider errors such as 401 Unauthorized.
  3. Verify the provider-specific error

    • OpenAI often returns 401 Incorrect API key provided.
    • Anthropic may return authentication_error.
    • LangGraph is usually wrapping a lower-level auth failure from the model provider or remote runtime.
  4. Test with a hardcoded known-good token path

    • Temporarily inject a valid token directly into one node.
    • If that works, your problem is not LangGraph execution; it’s credential propagation.
    • Remove it after testing and move back to secret injection.

Prevention

  • Pass auth explicitly into graph execution
    • Don’t depend on ambient globals in workers.
  • Validate required secrets at process boot
    • Fail fast with a clear message if OPENAI_API_KEY, ANTHROPIC_API_KEY, or your platform token is missing.
  • Use per-request client construction for multi-tenant systems
    • Avoid shared authenticated singletons when requests can scale independently.

If you want a quick rule of thumb: when LangGraph scales and auth breaks, assume credential propagation before anything else. In TypeScript apps, that’s usually where the bug lives.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides