How to Fix 'authentication failed in production' in LangGraph (Python)
What this error usually means
authentication failed in production in LangGraph usually means your graph is calling a hosted service with missing, expired, or mismatched credentials. In practice, this shows up when you move from local dev to a deployed app and the runtime no longer has the same environment variables, secret store access, or API key scope.
The failure often appears during graph execution, not at import time. So your code works locally, then langgraph_sdk, ChatOpenAI, a tool call, or a remote node starts returning 401 Unauthorized or a provider-specific auth error once traffic hits production.
The Most Common Cause
The #1 cause is simple: your code reads secrets from .env locally, but production never gets those variables, or gets the wrong ones.
This happens a lot with LangGraph apps that use langchain_openai.ChatOpenAI, langgraph_sdk.Client, or any custom tool that depends on os.environ.
Broken vs fixed pattern
| Broken | Fixed |
|---|---|
Reads from local .env only | Uses injected production secrets |
Assumes load_dotenv() exists in prod | Validates env vars at startup |
| Fails later during graph execution | Fails fast before serving requests |
# BROKEN
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph
load_dotenv() # works locally, often absent in prod
llm = ChatOpenAI(model="gpt-4o-mini") # expects OPENAI_API_KEY in env
def call_model(state):
return {"messages": [llm.invoke(state["messages"])]}
graph = StateGraph(dict)
graph.add_node("model", call_model)
# FIXED
import os
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph
required = ["OPENAI_API_KEY"]
missing = [k for k in required if not os.getenv(k)]
if missing:
raise RuntimeError(f"Missing required env vars: {missing}")
llm = ChatOpenAI(
model="gpt-4o-mini",
api_key=os.environ["OPENAI_API_KEY"],
)
def call_model(state):
return {"messages": [llm.invoke(state["messages"])]}
graph = StateGraph(dict)
graph.add_node("model", call_model)
If you are using LangGraph Platform or LangSmith-connected deployments, the same issue applies to platform tokens:
from langgraph_sdk import Client
client = Client(
api_url=os.environ["LANGGRAPH_API_URL"],
api_key=os.environ["LANGGRAPH_API_KEY"], # must exist in prod
)
If LANGGRAPH_API_KEY is missing or stale, you will get auth failures even though the graph compiles fine.
Other Possible Causes
1) Wrong provider key for the environment
You may have an OpenAI key locally and an Anthropic key in prod config, but the app still initializes the wrong client.
# BAD: hard-coded provider mismatch
llm = ChatOpenAI(api_key=os.getenv("ANTHROPIC_API_KEY"))
# GOOD: explicit provider wiring
llm = ChatOpenAI(api_key=os.environ["OPENAI_API_KEY"])
Typical error text:
- •
openai.AuthenticationError: Incorrect API key provided - •
401 Unauthorized
2) Secret injected into build time, not runtime
In Docker, CI/CD, or serverless deploys, the secret may exist during image build but not when the container starts.
# BAD: relies on build-time env only
ENV OPENAI_API_KEY=$OPENAI_API_KEY
Better:
# GOOD: runtime secret injection example
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: llm-secrets
key: openai_api_key
If you use Kubernetes, check that the pod actually sees the secret:
kubectl exec -it <pod> -- printenv | grep OPENAI
3) Expired LangSmith or LangGraph token
If your graph uses tracing or remote execution hooks, an expired token can look like a generic auth failure.
import os
os.environ["LANGSMITH_API_KEY"] = "old-token"
os.environ["LANGCHAIN_TRACING_V2"] = "true"
Symptoms:
- •Traces stop appearing in LangSmith
- •Remote graph calls return
401 - •Errors mention
Unauthorizedorinvalid token
Rotate the token and redeploy. Do not rely on old local shell exports.
4) Service-to-service auth mismatch in a multi-node graph
If one node calls another internal service, production may require mTLS, JWTs, or signed headers that your local setup skips.
# BAD: no auth header for internal service call
requests.get("https://internal-api.company.com/profile")
# GOOD: pass explicit auth from secret store
requests.get(
"https://internal-api.company.com/profile",
headers={"Authorization": f"Bearer {os.environ['INTERNAL_SERVICE_TOKEN']}"},
)
This is common when a tool node calls CRM/claims/billing APIs inside a LangGraph workflow.
How to Debug It
- •
Check which client is failing
- •Look at the stack trace.
- •If you see
openai.AuthenticationError, it is provider auth. - •If you see
langgraph_sdkor HTTP401, it is likely platform/auth routing. - •If you see a tool exception inside a node, inspect that node first.
- •
Print env var presence at startup
keys = ["OPENAI_API_KEY", "LANGGRAPH_API_KEY", "LANGSMITH_API_KEY"] print({k: bool(os.getenv(k)) for k in keys})Do this in the deployed runtime, not just locally.
- •
Reproduce with curl or a minimal script
from openai import OpenAI client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) print(client.models.list())If this fails outside LangGraph, the issue is not your graph logic.
- •
Inspect deployment config
- •Docker Compose env files
- •Kubernetes Secrets / Helm values
- •Render/Fly/Cloud Run environment variables
- •CI/CD secret injection step
Most “production-only” auth bugs are config drift, not code bugs.
Prevention
- •Validate required secrets at process startup and fail fast.
- •Keep separate config classes for local/dev/staging/prod so keys do not drift.
- •Add a health check that verifies one authenticated provider call before serving traffic.
- •Never depend on
.envbeing present outside local development.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit