How to Fix 'timeout error' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21
timeout-errorllamaindexpython

What the timeout error means

In LlamaIndex, a timeout error usually means one of the network calls in your pipeline took longer than the configured limit. It shows up most often when you’re calling an LLM, embedding model, vector store, or external data loader that sits behind HTTP.

Typical symptoms look like this:

  • httpx.ReadTimeout
  • openai.APITimeoutError
  • TimeoutError: Request timed out
  • llama_index.core.indices.query.schema failing after a long wait

If you see it during indexing or querying, the fix is usually not “increase timeout and move on”. You need to identify which layer is timing out and whether the request is too large, too slow, or misconfigured.

The Most Common Cause

The #1 cause is sending too much work in a single request. In LlamaIndex, this usually happens when chunk sizes are too large, retrieval pulls too many nodes, or you’re using a model with a short default timeout.

Here’s the broken pattern:

BrokenFixed
Large chunks + default timeoutSmaller chunks + explicit timeout
Heavy query over many nodesLimit top-k and refine response mode
# BROKEN
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

docs = SimpleDirectoryReader("data").load_data()

llm = OpenAI(model="gpt-4o-mini")  # default timeout may be too short for your workload
index = VectorStoreIndex.from_documents(docs)

query_engine = index.as_query_engine(
    llm=llm,
    similarity_top_k=20,  # too many nodes for one slow call
)

response = query_engine.query("Summarize all policy exceptions in these documents.")
print(response)
# FIXED
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.openai import OpenAI

docs = SimpleDirectoryReader("data").load_data()

Settings.chunk_size = 512
Settings.chunk_overlap = 50

llm = OpenAI(
    model="gpt-4o-mini",
    timeout=120.0,   # explicit timeout
    max_retries=3,
)

index = VectorStoreIndex.from_documents(docs)

query_engine = index.as_query_engine(
    llm=llm,
    similarity_top_k=5,
    response_mode="compact",
)

response = query_engine.query("Summarize all policy exceptions in these documents.")
print(response)

The main idea: reduce the amount of context going into each call. If you ask LlamaIndex to stuff 20 large chunks into a single completion, you’re inviting timeouts.

Other Possible Causes

1) Slow embedding generation during ingestion

If indexing hangs before queries even start, embeddings are often the bottleneck.

# Too many documents processed in one shot
index = VectorStoreIndex.from_documents(docs)

Fix it by batching ingestion or using a faster embedding model:

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

Settings.embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",
    timeout=90.0,
)

2) Network issues or proxy/firewall restrictions

A local environment can reach some APIs but fail intermittently through corporate proxy rules.

import os
os.environ["HTTP_PROXY"] = "http://proxy.mycorp.local:8080"
os.environ["HTTPS_PROXY"] = "http://proxy.mycorp.local:8080"

If your logs show httpx.ConnectTimeout or httpx.ReadTimeout, check DNS resolution, proxy config, and outbound allowlists.

3) Using async code incorrectly

Mixing sync and async calls can create blocking behavior that looks like a timeout.

# Wrong: calling async workflow from sync context without awaiting properly
response = query_engine.aquery("What is the claim status?")

Correct pattern:

import asyncio

async def main():
    response = await query_engine.aquery("What is the claim status?")
    print(response)

asyncio.run(main())

If you’re inside FastAPI or another async framework, keep the whole path async.

4) Tool/function calls taking too long

Agent workflows can time out while waiting on tools like SQL queries, web fetches, or internal services.

from llama_index.core.tools import FunctionTool

def slow_lookup(policy_id: str):
    # bad: unbounded external call
    return requests.get(f"https://internal-api/policies/{policy_id}").json()

Wrap external calls with explicit timeouts:

import requests

def slow_lookup(policy_id: str):
    r = requests.get(
        f"https://internal-api/policies/{policy_id}",
        timeout=10,
    )
    return r.json()

How to Debug It

  1. Read the exact exception class

    • httpx.ReadTimeout points to a request waiting too long.
    • openai.APITimeoutError points to the model provider timing out.
    • A generic TimeoutError may be coming from your own wrapper or orchestration layer.
  2. Isolate ingestion from querying

    • Run indexing alone.
    • Then run one simple query.
    • If ingestion fails first, focus on embeddings and document loading.
    • If only querying fails, focus on retrieval size and LLM settings.
  3. Reduce the problem size

    • Set similarity_top_k=1.
    • Use a tiny document.
    • Lower chunk size.
    • If it works with smaller inputs, your issue is load-related rather than connectivity-related.
  4. Turn on logging

import logging

logging.basicConfig(level=logging.INFO)
logging.getLogger("httpx").setLevel(logging.DEBUG)
logging.getLogger("llama_index").setLevel(logging.DEBUG)

Watch for:

  • repeated retries
  • long gaps before failure
  • which endpoint is slow: embeddings, chat completions, vector DB calls

Prevention

  • Set explicit timeouts on every external dependency: LLMs, embeddings, HTTP tools, database clients.
  • Keep chunks small enough for your target model and use conservative similarity_top_k values.
  • Add retry logic with backoff for transient network failures instead of relying on defaults.

If you build this into your LlamaIndex setup from day one, you’ll spend less time chasing mysterious timeouts and more time fixing the real bottleneck.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides