How to Fix 'timeout error' in LlamaIndex (Python)
What the timeout error means
In LlamaIndex, a timeout error usually means one of the network calls in your pipeline took longer than the configured limit. It shows up most often when you’re calling an LLM, embedding model, vector store, or external data loader that sits behind HTTP.
Typical symptoms look like this:
- •
httpx.ReadTimeout - •
openai.APITimeoutError - •
TimeoutError: Request timed out - •
llama_index.core.indices.query.schemafailing after a long wait
If you see it during indexing or querying, the fix is usually not “increase timeout and move on”. You need to identify which layer is timing out and whether the request is too large, too slow, or misconfigured.
The Most Common Cause
The #1 cause is sending too much work in a single request. In LlamaIndex, this usually happens when chunk sizes are too large, retrieval pulls too many nodes, or you’re using a model with a short default timeout.
Here’s the broken pattern:
| Broken | Fixed |
|---|---|
| Large chunks + default timeout | Smaller chunks + explicit timeout |
| Heavy query over many nodes | Limit top-k and refine response mode |
# BROKEN
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
docs = SimpleDirectoryReader("data").load_data()
llm = OpenAI(model="gpt-4o-mini") # default timeout may be too short for your workload
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine(
llm=llm,
similarity_top_k=20, # too many nodes for one slow call
)
response = query_engine.query("Summarize all policy exceptions in these documents.")
print(response)
# FIXED
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.openai import OpenAI
docs = SimpleDirectoryReader("data").load_data()
Settings.chunk_size = 512
Settings.chunk_overlap = 50
llm = OpenAI(
model="gpt-4o-mini",
timeout=120.0, # explicit timeout
max_retries=3,
)
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine(
llm=llm,
similarity_top_k=5,
response_mode="compact",
)
response = query_engine.query("Summarize all policy exceptions in these documents.")
print(response)
The main idea: reduce the amount of context going into each call. If you ask LlamaIndex to stuff 20 large chunks into a single completion, you’re inviting timeouts.
Other Possible Causes
1) Slow embedding generation during ingestion
If indexing hangs before queries even start, embeddings are often the bottleneck.
# Too many documents processed in one shot
index = VectorStoreIndex.from_documents(docs)
Fix it by batching ingestion or using a faster embedding model:
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
timeout=90.0,
)
2) Network issues or proxy/firewall restrictions
A local environment can reach some APIs but fail intermittently through corporate proxy rules.
import os
os.environ["HTTP_PROXY"] = "http://proxy.mycorp.local:8080"
os.environ["HTTPS_PROXY"] = "http://proxy.mycorp.local:8080"
If your logs show httpx.ConnectTimeout or httpx.ReadTimeout, check DNS resolution, proxy config, and outbound allowlists.
3) Using async code incorrectly
Mixing sync and async calls can create blocking behavior that looks like a timeout.
# Wrong: calling async workflow from sync context without awaiting properly
response = query_engine.aquery("What is the claim status?")
Correct pattern:
import asyncio
async def main():
response = await query_engine.aquery("What is the claim status?")
print(response)
asyncio.run(main())
If you’re inside FastAPI or another async framework, keep the whole path async.
4) Tool/function calls taking too long
Agent workflows can time out while waiting on tools like SQL queries, web fetches, or internal services.
from llama_index.core.tools import FunctionTool
def slow_lookup(policy_id: str):
# bad: unbounded external call
return requests.get(f"https://internal-api/policies/{policy_id}").json()
Wrap external calls with explicit timeouts:
import requests
def slow_lookup(policy_id: str):
r = requests.get(
f"https://internal-api/policies/{policy_id}",
timeout=10,
)
return r.json()
How to Debug It
- •
Read the exact exception class
- •
httpx.ReadTimeoutpoints to a request waiting too long. - •
openai.APITimeoutErrorpoints to the model provider timing out. - •A generic
TimeoutErrormay be coming from your own wrapper or orchestration layer.
- •
- •
Isolate ingestion from querying
- •Run indexing alone.
- •Then run one simple query.
- •If ingestion fails first, focus on embeddings and document loading.
- •If only querying fails, focus on retrieval size and LLM settings.
- •
Reduce the problem size
- •Set
similarity_top_k=1. - •Use a tiny document.
- •Lower chunk size.
- •If it works with smaller inputs, your issue is load-related rather than connectivity-related.
- •Set
- •
Turn on logging
import logging
logging.basicConfig(level=logging.INFO)
logging.getLogger("httpx").setLevel(logging.DEBUG)
logging.getLogger("llama_index").setLevel(logging.DEBUG)
Watch for:
- •repeated retries
- •long gaps before failure
- •which endpoint is slow: embeddings, chat completions, vector DB calls
Prevention
- •Set explicit timeouts on every external dependency: LLMs, embeddings, HTTP tools, database clients.
- •Keep chunks small enough for your target model and use conservative
similarity_top_kvalues. - •Add retry logic with backoff for transient network failures instead of relying on defaults.
If you build this into your LlamaIndex setup from day one, you’ll spend less time chasing mysterious timeouts and more time fixing the real bottleneck.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit