How to Fix 'deployment crash in production' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21

deployment-crash-in-productionllamaindexpython

When LlamaIndex crashes during deployment, it usually means your app worked locally but failed under production conditions: different environment variables, missing dependencies, bad async handling, or an index that was built in-memory and never persisted. The exact failure often shows up as a startup crash, a worker restart loop, or an exception during the first query.

The pattern is usually the same: the code path that worked in a notebook or local script is not safe for a long-running Python service.

The Most Common Cause

The #1 cause is building the index at import time or inside the request path without persisting it. In production, that means every worker tries to load data, create embeddings, and initialize storage when the app starts or when traffic hits it.

Typical errors you’ll see:

•ValueError: No storage context found
•FileNotFoundError: [Errno 2] No such file or directory: 'storage'
•RuntimeError: Event loop is closed
•llama_index.core.indices.loading.load_index_from_storage failing because nothing was persisted

Broken vs fixed pattern

Broken	Fixed
Build index on every import/request	Build once, persist, then load
Uses ephemeral in-memory state	Uses `StorageContext` + disk/object storage
Fails on process restart	Survives restarts

# broken.py
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs)

def answer კით(query: str):
    query_engine = index.as_query_engine()
    return query_engine.query(query)

# fixed.py
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, load_index_from_storage
import os

PERSIST_DIR = "./storage"

def build_or_load_index():
    if os.path.exists(PERSIST_DIR):
        storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
        return load_index_from_storage(storage_context)

    docs = SimpleDirectoryReader("./data").load_data()
    index = VectorStoreIndex.from_documents(docs)
    index.storage_context.persist(persist_dir=PERSIST_DIR)
    return index

index = build_or_load_index()

def answer(query: str):
    query_engine = index.as_query_engine()
    return query_engine.query(query)

If you’re running FastAPI, don’t rebuild the index inside the route handler. Load it once during app startup and reuse it.

Other Possible Causes

1) Missing API keys or env vars in production

This is common with OpenAI-backed embeddings or LLMs. Locally you have .env; in prod the container has nothing.

from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(api_key=os.environ["OPENAI_API_KEY"])

If OPENAI_API_KEY is missing, you’ll get a KeyError or downstream auth failures like:

•openai.AuthenticationError
•401 Unauthorized

Fix by injecting env vars through your deployment platform and validating at startup.

required = ["OPENAI_API_KEY", "LLAMA_CLOUD_API_KEY"]
missing = [k for k in required if not os.getenv(k)]
if missing:
    raise RuntimeError(f"Missing env vars: {missing}")

2) Async misuse in a sync server

A lot of deployment crashes come from calling async LlamaIndex APIs incorrectly.

# broken
response = query_engine.aquery("What is this?")
print(response)  # coroutine object

# fixed
import asyncio

response = asyncio.run(query_engine.aquery("What is this?"))
print(response)

In FastAPI endpoints, use async def and await instead of nesting event loops. If you see:

•RuntimeError: This event loop is already running
•RuntimeError: asyncio.run() cannot be called from a running event loop

you’ve got an async boundary problem.

3) Version mismatch between LlamaIndex packages

LlamaIndex moved fast and split integrations into separate packages. A local install can work while prod pulls incompatible versions.

Example failure:

•ImportError: cannot import name 'OpenAI' from 'llama_index.llms.openai'
•ModuleNotFoundError: No module named 'llama_index.embeddings.openai'

Pin versions explicitly:

llama-index==0.10.68
llama-index-llms-openai==0.1.27
llama-index-embeddings-openai==0.1.11
openai==1.40.6

Then rebuild your image from scratch so old wheels don’t linger.

4) Bad persistence path or read-only filesystem

If your container writes to /app/storage but that path is read-only, persistence fails at runtime.

index.storage_context.persist(persist_dir="/app/storage")

Common errors:

•PermissionError: [Errno 13] Permission denied
•OSError: Read-only file system

Use a writable volume:

persist_dir = "/tmp/storage"

or mount persistent storage in Kubernetes/ECS.

How to Debug It

•
Check the first real exception in logs
- •Ignore the worker restart noise.
- •Look for the root error before Uvicorn/Gunicorn retries.
- •Search for messages like No storage context found, AuthenticationError, or PermissionError.

•

Print environment and package versions at startup

import os, llama_index
print("LLAMA_INDEX_VERSION", llama_index.__version__)
print("OPENAI_API_KEY_SET", bool(os.getenv("OPENAI_API_KEY")))

Compare local vs production output exactly.

•

Test index loading separately

from llama_index.core import StorageContext, load_index_from_storage

storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

If this fails alone, your persistence layer is broken.

•
Remove request-time initialization
- •Move document loading out of routes.
- •Move embedding model setup out of handlers.
- •Create one shared index instance per process.

Prevention

•Persist indexes explicitly with StorageContext.persist() and reload with load_index_from_storage().
•Pin all LlamaIndex-related packages and rebuild containers cleanly on deploy.
•Validate secrets and filesystem permissions at startup before serving traffic.

If you want stable production behavior with LlamaIndex, treat it like any other stateful backend component. Build once, persist once, load predictably, and keep request handlers thin.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit