How to Fix 'deployment crash' in LlamaIndex (Python)
When you see deployment crash in a LlamaIndex app, it usually means the model endpoint accepted your request but failed while serving it. In practice, this shows up when you call an LLM or embedding deployment with the wrong model name, missing credentials, bad transport config, or a payload the backend can’t handle.
This error often appears during Settings.llm initialization, VectorStoreIndex.from_documents(...), or the first query against a remote provider like Azure OpenAI, OpenAI-compatible gateways, or local inference servers.
The Most Common Cause
The #1 cause is a mismatch between the deployment name and the actual model/provider configuration.
With LlamaIndex, people often wire up AzureOpenAI or an OpenAI-compatible client and assume the model= value is the same as the deployment name. On Azure, that’s not true: model must match your deployment name, and api_base, api_version, and credentials must also line up.
Broken vs fixed
| Broken pattern | Fixed pattern |
|---|---|
| Uses a model name that does not exist as a deployment | Uses the exact Azure deployment name |
| Omits required Azure config | Sets endpoint, key, and API version explicitly |
# BROKEN
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.core import Settings
llm = AzureOpenAI(
model="gpt-4o", # wrong if your Azure deployment is named "prod-gpt4o"
deployment_name="gpt-4o",
api_key="...",
azure_endpoint="https://my-resource.openai.azure.com/",
)
Settings.llm = llm
# FIXED
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.core import Settings
llm = AzureOpenAI(
model="prod-gpt4o", # must match the Azure deployment name
deployment_name="prod-gpt4o",
api_key="...",
azure_endpoint="https://my-resource.openai.azure.com/",
api_version="2024-02-15-preview",
)
Settings.llm = llm
If this is your issue, you’ll usually see something like:
- •
BadRequestError: Error code: 400 - {'error': {'message': 'deployment crash', ...}} - •
openai.BadRequestError: The API deployment for this resource does not exist - •
litellm.BadRequestError: LLM Provider NOT providedwhen using a proxy layer incorrectly
Other Possible Causes
1) Wrong environment variables or missing secrets
LlamaIndex reads provider config from env vars in many setups. If your app works locally but crashes in CI or Docker, check the runtime environment first.
# Example: missing key in container
import os
print(os.getenv("AZURE_OPENAI_API_KEY"))
print(os.getenv("AZURE_OPENAI_ENDPOINT"))
If either prints None, your deployment will fail before inference starts.
2) Mixing sync and async clients incorrectly
A common failure mode is calling async query APIs from sync code or reusing a client across event loops.
# BROKEN
response = await index.as_query_engine().query("What is this?")
# inside a normal def() function without an event loop
# FIXED
query_engine = index.as_query_engine()
response = query_engine.query("What is this?")
If you’re using async, keep the whole path async:
async def main():
query_engine = index.as_query_engine()
response = await query_engine.aquery("What is this?")
3) Context window too large for the selected model
If you stuff too many documents into one prompt, some providers fail with opaque backend errors that surface as a crash.
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(similarity_top_k=20)
Reduce retrieval size and chunking pressure:
query_engine = index.as_query_engine(similarity_top_k=5)
Also make chunks smaller when ingesting:
from llama_index.core.node_parser import SentenceSplitter
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=64)
4) Bad proxy / gateway configuration
If you route LlamaIndex through LiteLLM, OpenRouter, vLLM, Ollama, or an internal gateway, one bad base URL or auth header can produce a generic deployment failure.
# Example of an OpenAI-compatible client pointed at the wrong base URL
from llama_index.llms.openai import OpenAI
llm = OpenAI(
model="gpt-4o-mini",
api_base="http://localhost:8000/v1", # wrong if nothing serves there
api_key="sk-not-used",
)
Make sure the server actually exposes an OpenAI-compatible /v1/chat/completions endpoint.
How to Debug It
- •
Print the exact exception
- •Don’t stop at “deployment crash”.
- •Capture the full stack trace and look for provider-specific classes like:
- •
openai.BadRequestError - •
litellm.BadRequestError - •
httpx.ConnectError - •
aiohttp.ClientResponseError
- •
- •
Verify provider config outside LlamaIndex
- •Test the same endpoint with a raw SDK call.
- •If raw OpenAI/Azure/OpenRouter fails too, this is not a LlamaIndex bug.
- •
Check the effective runtime settings
- •Log these values at startup:
- •model/deployment name
- •API base/endpoint
- •API version
- •auth headers / env vars
- •In production containers, assume env drift until proven otherwise.
- •Log these values at startup:
- •
Reduce to one document and one query
- •Remove retrievers, agents, tools, and custom prompts.
- •Start with:
from llama_index.core import VectorStoreIndex index = VectorStoreIndex.from_documents([documents[0]]) print(index.as_query_engine().query("Summarize this")) - •If that works, scale back up until it breaks.
Prevention
- •
Pin your provider config in one place.
- •Don’t scatter
model,api_base, and keys across modules. - •Build one settings module and import it everywhere.
- •Don’t scatter
- •
Add startup validation.
- •Fail fast if required env vars are missing.
- •Check that your deployment name exists before serving traffic.
- •
Keep ingestion conservative.
- •Use sane chunk sizes.
- •Start with low
similarity_top_k. - •Avoid dumping huge PDFs into a single prompt path.
If you’re still seeing deployment crash after checking these items, treat it as a provider integration issue first and a LlamaIndex issue second. The fastest fix is usually in your endpoint name, auth config, or prompt size—not in the retrieval code itself.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit