How to Fix 'deployment crash in production' in LlamaIndex (Python)
When LlamaIndex crashes during deployment, it usually means your app worked locally but failed under production conditions: different environment variables, missing dependencies, bad async handling, or an index that was built in-memory and never persisted. The exact failure often shows up as a startup crash, a worker restart loop, or an exception during the first query.
The pattern is usually the same: the code path that worked in a notebook or local script is not safe for a long-running Python service.
The Most Common Cause
The #1 cause is building the index at import time or inside the request path without persisting it. In production, that means every worker tries to load data, create embeddings, and initialize storage when the app starts or when traffic hits it.
Typical errors you’ll see:
- •
ValueError: No storage context found - •
FileNotFoundError: [Errno 2] No such file or directory: 'storage' - •
RuntimeError: Event loop is closed - •
llama_index.core.indices.loading.load_index_from_storagefailing because nothing was persisted
Broken vs fixed pattern
| Broken | Fixed |
|---|---|
| Build index on every import/request | Build once, persist, then load |
| Uses ephemeral in-memory state | Uses StorageContext + disk/object storage |
| Fails on process restart | Survives restarts |
# broken.py
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs)
def answer კით(query: str):
query_engine = index.as_query_engine()
return query_engine.query(query)
# fixed.py
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, load_index_from_storage
import os
PERSIST_DIR = "./storage"
def build_or_load_index():
if os.path.exists(PERSIST_DIR):
storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
return load_index_from_storage(storage_context)
docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs)
index.storage_context.persist(persist_dir=PERSIST_DIR)
return index
index = build_or_load_index()
def answer(query: str):
query_engine = index.as_query_engine()
return query_engine.query(query)
If you’re running FastAPI, don’t rebuild the index inside the route handler. Load it once during app startup and reuse it.
Other Possible Causes
1) Missing API keys or env vars in production
This is common with OpenAI-backed embeddings or LLMs. Locally you have .env; in prod the container has nothing.
from llama_index.embeddings.openai import OpenAIEmbedding
embed_model = OpenAIEmbedding(api_key=os.environ["OPENAI_API_KEY"])
If OPENAI_API_KEY is missing, you’ll get a KeyError or downstream auth failures like:
- •
openai.AuthenticationError - •
401 Unauthorized
Fix by injecting env vars through your deployment platform and validating at startup.
required = ["OPENAI_API_KEY", "LLAMA_CLOUD_API_KEY"]
missing = [k for k in required if not os.getenv(k)]
if missing:
raise RuntimeError(f"Missing env vars: {missing}")
2) Async misuse in a sync server
A lot of deployment crashes come from calling async LlamaIndex APIs incorrectly.
# broken
response = query_engine.aquery("What is this?")
print(response) # coroutine object
# fixed
import asyncio
response = asyncio.run(query_engine.aquery("What is this?"))
print(response)
In FastAPI endpoints, use async def and await instead of nesting event loops. If you see:
- •
RuntimeError: This event loop is already running - •
RuntimeError: asyncio.run() cannot be called from a running event loop
you’ve got an async boundary problem.
3) Version mismatch between LlamaIndex packages
LlamaIndex moved fast and split integrations into separate packages. A local install can work while prod pulls incompatible versions.
Example failure:
- •
ImportError: cannot import name 'OpenAI' from 'llama_index.llms.openai' - •
ModuleNotFoundError: No module named 'llama_index.embeddings.openai'
Pin versions explicitly:
llama-index==0.10.68
llama-index-llms-openai==0.1.27
llama-index-embeddings-openai==0.1.11
openai==1.40.6
Then rebuild your image from scratch so old wheels don’t linger.
4) Bad persistence path or read-only filesystem
If your container writes to /app/storage but that path is read-only, persistence fails at runtime.
index.storage_context.persist(persist_dir="/app/storage")
Common errors:
- •
PermissionError: [Errno 13] Permission denied - •
OSError: Read-only file system
Use a writable volume:
persist_dir = "/tmp/storage"
or mount persistent storage in Kubernetes/ECS.
How to Debug It
- •
Check the first real exception in logs
- •Ignore the worker restart noise.
- •Look for the root error before Uvicorn/Gunicorn retries.
- •Search for messages like
No storage context found,AuthenticationError, orPermissionError.
- •
Print environment and package versions at startup
import os, llama_index print("LLAMA_INDEX_VERSION", llama_index.__version__) print("OPENAI_API_KEY_SET", bool(os.getenv("OPENAI_API_KEY")))Compare local vs production output exactly.
- •
Test index loading separately
from llama_index.core import StorageContext, load_index_from_storage storage_context = StorageContext.from_defaults(persist_dir="./storage") index = load_index_from_storage(storage_context)If this fails alone, your persistence layer is broken.
- •
Remove request-time initialization
- •Move document loading out of routes.
- •Move embedding model setup out of handlers.
- •Create one shared index instance per process.
Prevention
- •Persist indexes explicitly with
StorageContext.persist()and reload withload_index_from_storage(). - •Pin all LlamaIndex-related packages and rebuild containers cleanly on deploy.
- •Validate secrets and filesystem permissions at startup before serving traffic.
If you want stable production behavior with LlamaIndex, treat it like any other stateful backend component. Build once, persist once, load predictably, and keep request handlers thin.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit