How to Fix 'connection timeout during development' in LlamaIndex (Python)

By Cyprian AaronsUpdated 2026-04-21
connection-timeout-during-developmentllamaindexpython

A connection timeout during development in LlamaIndex usually means your Python process tried to reach an external service and never got a response before the client timeout expired. In practice, this shows up when you’re calling an LLM provider, embedding API, vector DB, or local model server from a dev machine with bad network settings, wrong endpoints, or a service that isn’t actually running.

The key thing: this is rarely a “LlamaIndex bug”. It’s usually a transport problem between llama_index.core and whatever backend you configured.

The Most Common Cause

The #1 cause is pointing LlamaIndex at a host that is unreachable from your Python process. That happens a lot with local development setups: Docker containers using localhost, Ollama not running, wrong port mapping, or a cloud endpoint blocked by proxy/firewall rules.

Here’s the broken pattern versus the fixed one.

BrokenFixed
Uses localhost from inside DockerUses host.docker.internal or the container service name
Assumes the backend is runningVerifies the server is reachable first
No timeout handlingExplicit timeout + fail fast
# BROKEN: local model server is not reachable from this runtime
from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="gpt-4o-mini",
    api_base="http://localhost:11434/v1",  # wrong if running inside Docker
    api_key="ollama",                      # irrelevant if endpoint is wrong
)

response = llm.complete("Write a test plan")
print(response)
# FIXED: use a reachable host and verify the endpoint first
import requests
from llama_index.llms.openai import OpenAI

base_url = "http://host.docker.internal:11434/v1"  # or your actual service name

health = requests.get("http://host.docker.internal:11434/api/tags", timeout=5)
health.raise_for_status()

llm = OpenAI(
    model="gpt-4o-mini",
    api_base=base_url,
    api_key="ollama",
    request_timeout=30,
)

response = llm.complete("Write a test plan")
print(response)

If you’re using Ollama directly through LlamaIndex, the same rule applies:

from llama_index.llms.ollama import Ollama

llm = Ollama(model="llama3.1", base_url="http://localhost:11434")  # broken in some dev setups

Fix it by pointing at the actual reachable host:

from llama_index.llms.ollama import Ollama

llm = Ollama(model="llama3.1", base_url="http://127.0.0.1:11434")

If your code runs inside Docker, 127.0.0.1 means “inside the container”, not your laptop.

Other Possible Causes

1) The provider SDK timeout is too low

A slow cold start or large prompt can exceed the default request timeout.

from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="gpt-4o-mini",
    api_key=os.environ["OPENAI_API_KEY"],
    request_timeout=5,  # too aggressive for dev networks
)

Increase it:

llm = OpenAI(
    model="gpt-4o-mini",
    api_key=os.environ["OPENAI_API_KEY"],
    request_timeout=60,
)

2) Wrong environment variables or missing credentials

This often surfaces as retries followed by timeouts when the client keeps trying bad auth or hitting the wrong region.

export OPENAI_API_BASE=https://api.openai.com/v1
export OPENAI_API_KEY=sk-...

For Azure OpenAI, make sure you’re using Azure-specific config, not vanilla OpenAI settings:

from llama_index.llms.azure_openai import AzureOpenAI

llm = AzureOpenAI(
    engine="my-deployment",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version="2024-02-15-preview",
)

3) Your vector database endpoint is down or misconfigured

This often appears when building indexes with VectorStoreIndex and the store can’t be reached.

from llama_index.vector_stores.pinecone import PineconeVectorStore

vector_store = PineconeVectorStore(
    pinecone_index=pinecone.Index("my-index")  # fails if index name/region is wrong
)

Check connectivity and namespace config before creating the index.

4) Proxy, VPN, or corporate firewall interference

If your laptop works on home Wi‑Fi but times out on VPN, this is probably network policy blocking outbound traffic to the provider.

# Example proxy config for Python runtime
export HTTPS_PROXY=http://proxy.company.local:8080
export HTTP_PROXY=http://proxy.company.local:8080

Some SDKs also need explicit proxy support through httpx depending on how LlamaIndex wraps the client.

How to Debug It

  1. Isolate the failing call

    • Comment out retrieval and indexing.
    • Call only llm.complete("ping").
    • If that fails, it’s transport/config, not your RAG pipeline.
  2. Test the backend outside LlamaIndex

    • Use curl, Postman, or plain requests.
    • If this fails:
      curl http://localhost:11434/api/tags
      
      then LlamaIndex is not the problem.
  3. Turn on verbose logging

    • Look for repeated retries, DNS failures, or connection refusal.
    • Typical underlying errors include:
      • httpx.ConnectTimeout
      • httpx.ReadTimeout
      • openai.APITimeoutError
      • ConnectionError: [Errno 111] Connection refused
  4. Check where your code is running

    • Local Python process?
    • Docker container?
    • WSL?
    • CI runner?

    A classic mistake is this:

    base_url = "http://localhost:11434"
    

    which works on your machine but fails inside CI or containers.

Prevention

  • Set explicit timeouts on every external client used by LlamaIndex.
  • Add a startup health check for every dependency:
    • LLM endpoint
    • embedding endpoint
    • vector DB
  • Use environment-specific config:
    • local dev endpoints for laptops
    • service DNS names for Docker/Kubernetes
    • cloud URLs for staging/prod

If you’re building agent workflows for banks or insurance systems, treat these dependencies like production services even in development. A five-second health check at startup saves hours of chasing down connection timeout during development later.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides