How to Fix 'invalid API key when scaling' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21
invalid-api-key-when-scalingllamaindexpython

When LlamaIndex throws AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided'}} during scaling, it usually means one of your worker processes is not reading the same credentials as your local process. This shows up most often when you move from a single Python script to Docker, Celery, Kubernetes, or multiple Uvicorn/Gunicorn workers.

The key detail: the error is often not “the key is wrong” but “the key is missing in this runtime.”

The Most Common Cause

The #1 cause is setting the OpenAI key in one place, then initializing LlamaIndex in another place before that environment variable exists.

This happens a lot with Settings.llm, OpenAIEmbedding, or OpenAI objects created at import time.

Broken patternFixed pattern
Create LlamaIndex clients before loading env varsLoad env vars first, then create clients
Set OPENAI_API_KEY inside a function after importsSet it at process startup or pass it explicitly
# broken.py
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from dotenv import load_dotenv

# BAD: clients are created before env vars are loaded
Settings.llm = OpenAI(model="gpt-4o-mini")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

load_dotenv()  # too late

# later...
# fixed.py
import os
from dotenv import load_dotenv
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

load_dotenv()

api_key = os.environ["OPENAI_API_KEY"]

Settings.llm = OpenAI(
    model="gpt-4o-mini",
    api_key=api_key,
)
Settings.embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",
    api_key=api_key,
)

Why this breaks under scaling:

  • One process may inherit the env var.
  • Another worker may start without it.
  • Import-time initialization locks in the missing value.

If you see errors like:

  • openai.AuthenticationError: Error code: 401
  • Incorrect API key provided
  • llama_index.core.llms.exceptions.LLMValidationError
  • ValueError: No API key found for OpenAI

this is usually where to start.

Other Possible Causes

1) The key exists locally but not inside the container or pod

A classic Docker/Kubernetes issue. Your shell has the variable, but the runtime does not.

# broken k8s snippet
containers:
  - name: app
    image: my-app:latest
    # no env var here
# fixed k8s snippet
containers:
  - name: app
    image: my-app:latest
    env:
      - name: OPENAI_API_KEY
        valueFrom:
          secretKeyRef:
            name: openai-secrets
            key: api_key

For Docker Compose:

services:
  app:
    image: my-app:latest
    environment:
      OPENAI_API_KEY: ${OPENAI_API_KEY}

2) Worker processes do not inherit updated environment variables

Gunicorn, Celery, and Uvicorn with multiple workers can start before secrets are injected.

# risky if env is changed after startup
gunicorn app:app --workers 4

Fix by ensuring secrets are present before process start:

export OPENAI_API_KEY="sk-..."
gunicorn app:app --workers 4

If you use systemd or a process manager, verify the service file includes the secret source and restart the service after changes.

3) You are mixing API keys from different providers or projects

LlamaIndex can use OpenAI-compatible endpoints, Azure OpenAI, or other providers. A valid key for one backend will fail against another.

# broken: using an Azure endpoint with a plain OpenAI key pattern/config mismatch
from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="gpt-4o-mini",
    api_key=os.environ["AZURE_OPENAI_API_KEY"],  # wrong provider mapping can break auth checks
    api_base="https://my-resource.openai.azure.com/",
)

Use the provider-specific class and config instead of forcing everything through one client. If you are on Azure, use Azure-specific settings and verify deployment name, endpoint, and version.

4) The key is being overwritten in code

Sometimes a helper module resets the key to None or an empty string during refactors.

# broken helper.py
from llama_index.core import Settings

def configure():
    Settings.llm.api_key = ""   # accidental overwrite
# fixed helper.py
def configure(api_key: str):
    assert api_key.strip(), "OPENAI_API_KEY is empty"
    return {
        "api_key": api_key,
    }

Also check for these bad patterns:

  • os.getenv("OPENAI_API_KEY", "")
  • .env file loaded in one module only
  • stale secrets in CI/CD variables

How to Debug It

  1. Print the effective key source at startup

    Check whether the process actually sees the variable.

    import os
    
    print("OPENAI_API_KEY exists:", bool(os.getenv("OPENAI_API_KEY")))
    print("Key prefix:", os.getenv("OPENAI_API_KEY", "")[:7])
    

    If this prints False or an empty prefix in production, your problem is deployment wiring.

  2. Confirm where LlamaIndex initializes clients

    Search for Settings.llm, Settings.embed_model, OpenAI(...), and any custom wrappers.

    If those run at import time, move them behind a startup function after env loading.

  3. Test inside the exact runtime

    Run a shell inside your container/pod/worker and inspect the environment.

    python -c "import os; print(os.getenv('OPENAI_API_KEY'))"
    

    If it works locally but not there, stop debugging application code and fix deployment config.

  4. Check for provider mismatch

    Verify that your class matches your backend:

    • OpenAI → llama_index.llms.openai.OpenAI
    • Azure OpenAI → Azure-specific integration/configuration
    • Local models → local LLM wrapper

    A valid-looking key with the wrong endpoint still produces authentication failures.

Prevention

  • Load secrets before importing modules that instantiate LlamaIndex clients.
  • Pass api_key= explicitly in production code instead of relying only on ambient environment state.
  • Add a startup health check that verifies both OPENAI_API_KEY presence and a minimal LlamaIndex call path before traffic reaches the service.

If this error appears only when scaling out, treat it as a runtime configuration bug first, not an SDK bug. In most cases, fixing secret propagation and initialization order resolves it immediately.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides