How to Fix 'authentication failed in production' in CrewAI (Python)
What this error usually means
authentication failed in production in CrewAI usually means your agent tried to call an LLM provider with missing, wrong, or non-production credentials. In practice, it shows up when you move from local testing to a deployed environment and the API key, endpoint, or provider config is not actually available at runtime.
The failure often appears inside crewai when an Agent, Task, or Crew tries to initialize the model backend. You’ll usually see something close to:
- •
AuthenticationError: authentication failed in production - •
openai.AuthenticationError: Incorrect API key provided - •
litellm.AuthenticationError: AuthenticationError - ...
The Most Common Cause
The #1 cause is simple: your local .env works, but production never loads it, or the environment variable name is wrong.
With CrewAI, this usually happens when you create an Agent with a model that expects an API key, but the key is not present in the deployed process.
Broken vs fixed
| Broken pattern | Fixed pattern |
|---|---|
| Hardcoded or missing env loading | Explicit env loading and validation |
Assumes .env exists in production | Uses real deployment secrets |
| No startup check for credentials | Fails fast before CrewAI runs |
# broken.py
from crewai import Agent, Task, Crew
from dotenv import load_dotenv
# This may work locally, but production often doesn't have .env
load_dotenv()
agent = Agent(
role="Support Analyst",
goal="Answer customer questions",
backstory="You handle banking support.",
llm="gpt-4o", # expects OPENAI_API_KEY
)
task = Task(
description="Summarize the customer's issue.",
agent=agent,
)
crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff()
# fixed.py
import os
from dotenv import load_dotenv
from crewai import Agent, Task, Crew
load_dotenv()
required_vars = ["OPENAI_API_KEY"]
missing = [var for var in required_vars if not os.getenv(var)]
if missing:
raise RuntimeError(f"Missing required env vars: {', '.join(missing)}")
agent = Agent(
role="Support Analyst",
goal="Answer customer questions",
backstory="You handle banking support.",
llm="gpt-4o",
)
task = Task(
description="Summarize the customer's issue.",
agent=agent,
)
crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff()
If you’re using OpenAI-compatible providers through LiteLLM, the same rule applies. The model string can be correct and still fail if the key is absent in prod.
Other Possible Causes
1) Wrong environment variable name
CrewAI won’t magically map your secret name to the provider’s expected variable. If you set OPEN_AI_KEY instead of OPENAI_API_KEY, authentication will fail.
# wrong
export OPEN_AI_KEY="sk-..."
# right
export OPENAI_API_KEY="sk-..."
2) Provider mismatch
You may be pointing CrewAI at one provider while supplying credentials for another. For example, using an Anthropic model string with OpenAI credentials.
# wrong
agent = Agent(
role="Analyst",
goal="Analyze claims data",
backstory="Insurance analyst.",
llm="claude-3-5-sonnet", # Anthropic model
)
# but only OPENAI_API_KEY is set
# right
agent = Agent(
role="Analyst",
goal="Analyze claims data",
backstory="Insurance analyst.",
llm="gpt-4o", # OpenAI model with OPENAI_API_KEY
)
If you want Anthropic, set the correct Anthropic secret and model config.
3) Secret exists locally but not in your deployment
This is common on Docker, Kubernetes, ECS, Render, Railway, or serverless jobs. The app starts fine because code imports succeed, then fails only when Crew.kickoff() hits the LLM call.
# docker-compose.yml snippet
services:
app:
image: my-crewai-app
environment:
OPENAI_API_KEY: ${OPENAI_API_KEY}
If ${OPENAI_API_KEY} is empty on the host machine or CI pipeline, production gets nothing.
4) Using a restricted or expired key
Some providers return auth errors when keys are revoked, scoped too tightly, or tied to a disabled org/project. The stack trace may still look like a generic CrewAI/LiteLLM auth failure.
# Example symptom: code is fine, key is bad
os.environ["OPENAI_API_KEY"] = "sk-prod-old-or-revoked"
Rotate the key and test again with a fresh secret from the provider dashboard.
How to Debug It
- •
Print what CrewAI sees at startup Check whether the expected variables exist before creating any agents.
import os print("OPENAI_API_KEY exists:", bool(os.getenv("OPENAI_API_KEY"))) print("ANTHROPIC_API_KEY exists:", bool(os.getenv("ANTHROPIC_API_KEY"))) - •
Confirm the exact model/provider path Don’t guess. Verify whether your
Agent.llmpoints to OpenAI, Anthropic, Azure OpenAI, or another backend. - •
Run a minimal auth test outside CrewAI If direct SDK auth fails, CrewAI is not the problem.
from openai import OpenAI client = OpenAI() print(client.models.list()) - •
Inspect deployment secrets In prod logs or shell access, confirm the variable is present in the running container/process, not just in your CI settings.
Prevention
- •Validate required secrets at process startup before constructing any
Agent,Task, orCrew. - •Keep provider-specific keys and model names together in one config module so you don’t mix OpenAI/Anthropic/Azure settings.
- •Add a smoke test in staging that calls the exact same model path your production crew uses.
If you’re seeing authentication failed in production only after deploy, treat it as a config problem first and a code problem second. In most cases, fixing env injection and provider alignment resolves it immediately.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit