How to Fix 'connection timeout' in LlamaIndex (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
connection-timeoutllamaindextypescript

A connection timeout in LlamaIndex TypeScript usually means your app tried to reach an upstream service — OpenAI, Azure OpenAI, Ollama, a vector DB, or an internal API — and the request never got a response before the socket timed out. In practice, this shows up during index.insert(), queryEngine.query(), embedding generation, or when initializing an LLM client.

The key thing: this is usually not a “LlamaIndex bug.” It’s almost always a network, client config, or runtime issue around OpenAI, OpenAIEmbedding, VectorStoreIndex, or whatever provider you plugged in.

The Most Common Cause

The #1 cause I see is using the wrong base URL or pointing the client at a service that is not actually reachable from the runtime where your Node process runs.

Typical error shape:

Error: request to http://localhost:11434/v1/chat/completions failed, reason: connect ECONNREFUSED 127.0.0.1:11434

Or:

Error: Request timed out.
    at OpenAI.chat.completions.create (...)
    at LLM.complete (...)
    at VectorStoreIndex.asQueryEngine (...)

Here’s the broken pattern versus the fixed one.

BrokenFixed
Uses localhost from inside Docker or a remote serverUses a reachable host name or container service name
Assumes Ollama/OpenAI-compatible server is running locallyVerifies the service is up before LlamaIndex starts
No timeout tuning or retry strategyExplicit timeout + sane endpoint config
// Broken
import { Settings, OpenAI } from "llamaindex";

Settings.llm = new OpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
  // Wrong if your app runs in Docker/CI/remote VM and localhost is not that machine
  baseURL: "http://localhost:11434/v1",
});

const response = await Settings.llm.complete("Summarize this document.");
console.log(response);
// Fixed
import { Settings, OpenAI } from "llamaindex";

Settings.llm = new OpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
  // Use the actual reachable endpoint
  baseURL: process.env.LLM_BASE_URL ?? "https://api.openai.com/v1",
  timeout: 60_000,
});

const response = await Settings.llm.complete("Summarize this document.");
console.log(response);

If you’re using Ollama in Docker, don’t point your app container at localhost. Use the service name from your compose network:

baseURL: "http://ollama:11434/v1"

If you’re on Kubernetes, use the cluster DNS name of the service. If you’re on a laptop but testing through a VPN or proxy, confirm that route first.

Other Possible Causes

1) The upstream model provider is slow or rate-limiting you

A slow first token can look like a timeout when your request window is too tight.

import { OpenAI } from "llamaindex";

const llm = new OpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
  timeout: 10_000, // too aggressive for large prompts / cold starts
});

Fix it by increasing timeout and reducing prompt size:

const llm = new OpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
  timeout: 60_000,
});

2) Your embedding call is timing out, not the chat call

This happens when VectorStoreIndex.fromDocuments() hangs because embeddings are slow or unreachable.

import { Settings, OpenAIEmbedding } from "llamaindex";

Settings.embedModel = new OpenAIEmbedding({
  model: "text-embedding-3-small",
  apiKey: process.env.OPENAI_API_KEY,
});

If embedding requests are failing, test them directly before building an index:

const embedding = await Settings.embedModel.getTextEmbedding("test text");
console.log(embedding.length);

3) Your proxy or firewall blocks outbound traffic

In corporate networks, Node can resolve DNS but still fail on TLS handshake or outbound ports.

curl -v https://api.openai.com/v1/models

If that fails from the same machine/container where Node runs, LlamaIndex will fail too. Set proxy variables if required:

export HTTPS_PROXY=http://proxy.company.local:8080
export HTTP_PROXY=http://proxy.company.local:8080

4) You are creating too many concurrent requests

Parallel document ingestion can overwhelm local services like Ollama or small internal gateways.

// Risky if docs is large and your backend is weak
await Promise.all(
  docs.map((doc) => index.insert(doc))
);

Throttle it:

for (const doc of docs) {
  await index.insert(doc);
}

Or use a concurrency limiter if you need throughput.

How to Debug It

  1. Test the upstream endpoint outside LlamaIndex.
    Use curl against the exact URL your app uses. If curl times out, fix networking first.

  2. Log the resolved client config.
    Print baseURL, model name, and any proxy env vars before creating OpenAI or OpenAIEmbedding.

  3. Isolate which call fails.
    Check whether it breaks on:

    • Settings.llm.complete(...)
    • Settings.embedModel.getTextEmbedding(...)
    • VectorStoreIndex.fromDocuments(...)
    • queryEngine.query(...)
  4. Increase timeout and reduce payload size.
    If smaller prompts work but larger ones fail, you’re hitting latency limits rather than connectivity issues.

Example diagnostic split:

try {
  const text = await Settings.llm.complete("ping");
  console.log("LLM ok", text.toString());
} catch (err) {
  console.error("LLM failed", err);
}

try {
  const emb = await Settings.embedModel.getTextEmbedding("ping");
  console.log("Embedding ok", emb.length);
} catch (err) {
  console.error("Embedding failed", err);
}

Prevention

  • Use environment-driven endpoints and validate them at startup.
  • Set explicit timeouts on every external client instead of relying on defaults.
  • Add a health check that pings your LLM/embedding provider before serving traffic.
  • Limit ingestion concurrency for local models and self-hosted gateways.
  • Keep Docker/Kubernetes networking in mind; localhost only means “this container” when code runs inside a container.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides