How to Fix 'connection timeout' in LlamaIndex (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

connection-timeoutllamaindextypescript

A connection timeout in LlamaIndex TypeScript usually means your app tried to reach an upstream service — OpenAI, Azure OpenAI, Ollama, a vector DB, or an internal API — and the request never got a response before the socket timed out. In practice, this shows up during index.insert(), queryEngine.query(), embedding generation, or when initializing an LLM client.

The key thing: this is usually not a “LlamaIndex bug.” It’s almost always a network, client config, or runtime issue around OpenAI, OpenAIEmbedding, VectorStoreIndex, or whatever provider you plugged in.

The Most Common Cause

The #1 cause I see is using the wrong base URL or pointing the client at a service that is not actually reachable from the runtime where your Node process runs.

Typical error shape:

Error: request to http://localhost:11434/v1/chat/completions failed, reason: connect ECONNREFUSED 127.0.0.1:11434

Or:

Error: Request timed out.
    at OpenAI.chat.completions.create (...)
    at LLM.complete (...)
    at VectorStoreIndex.asQueryEngine (...)

Here’s the broken pattern versus the fixed one.

Broken	Fixed
Uses `localhost` from inside Docker or a remote server	Uses a reachable host name or container service name
Assumes Ollama/OpenAI-compatible server is running locally	Verifies the service is up before LlamaIndex starts
No timeout tuning or retry strategy	Explicit timeout + sane endpoint config

// Broken
import { Settings, OpenAI } from "llamaindex";

Settings.llm = new OpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
  // Wrong if your app runs in Docker/CI/remote VM and localhost is not that machine
  baseURL: "http://localhost:11434/v1",
});

const response = await Settings.llm.complete("Summarize this document.");
console.log(response);

// Fixed
import { Settings, OpenAI } from "llamaindex";

Settings.llm = new OpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
  // Use the actual reachable endpoint
  baseURL: process.env.LLM_BASE_URL ?? "https://api.openai.com/v1",
  timeout: 60_000,
});

const response = await Settings.llm.complete("Summarize this document.");
console.log(response);

If you’re using Ollama in Docker, don’t point your app container at localhost. Use the service name from your compose network:

baseURL: "http://ollama:11434/v1"

If you’re on Kubernetes, use the cluster DNS name of the service. If you’re on a laptop but testing through a VPN or proxy, confirm that route first.

Other Possible Causes

1) The upstream model provider is slow or rate-limiting you

A slow first token can look like a timeout when your request window is too tight.

import { OpenAI } from "llamaindex";

const llm = new OpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
  timeout: 10_000, // too aggressive for large prompts / cold starts
});

Fix it by increasing timeout and reducing prompt size:

const llm = new OpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
  timeout: 60_000,
});

2) Your embedding call is timing out, not the chat call

This happens when VectorStoreIndex.fromDocuments() hangs because embeddings are slow or unreachable.

import { Settings, OpenAIEmbedding } from "llamaindex";

Settings.embedModel = new OpenAIEmbedding({
  model: "text-embedding-3-small",
  apiKey: process.env.OPENAI_API_KEY,
});

If embedding requests are failing, test them directly before building an index:

const embedding = await Settings.embedModel.getTextEmbedding("test text");
console.log(embedding.length);

3) Your proxy or firewall blocks outbound traffic

In corporate networks, Node can resolve DNS but still fail on TLS handshake or outbound ports.

curl -v https://api.openai.com/v1/models

If that fails from the same machine/container where Node runs, LlamaIndex will fail too. Set proxy variables if required:

export HTTPS_PROXY=http://proxy.company.local:8080
export HTTP_PROXY=http://proxy.company.local:8080

4) You are creating too many concurrent requests

Parallel document ingestion can overwhelm local services like Ollama or small internal gateways.

// Risky if docs is large and your backend is weak
await Promise.all(
  docs.map((doc) => index.insert(doc))
);

Throttle it:

for (const doc of docs) {
  await index.insert(doc);
}

Or use a concurrency limiter if you need throughput.

How to Debug It

•
Test the upstream endpoint outside LlamaIndex.
Use curl against the exact URL your app uses. If curl times out, fix networking first.
•
Log the resolved client config.
Print baseURL, model name, and any proxy env vars before creating OpenAI or OpenAIEmbedding.
•
Isolate which call fails.
Check whether it breaks on:
- •Settings.llm.complete(...)
- •Settings.embedModel.getTextEmbedding(...)
- •VectorStoreIndex.fromDocuments(...)
- •queryEngine.query(...)
•
Increase timeout and reduce payload size.
If smaller prompts work but larger ones fail, you’re hitting latency limits rather than connectivity issues.

Example diagnostic split:

try {
  const text = await Settings.llm.complete("ping");
  console.log("LLM ok", text.toString());
} catch (err) {
  console.error("LLM failed", err);
}

try {
  const emb = await Settings.embedModel.getTextEmbedding("ping");
  console.log("Embedding ok", emb.length);
} catch (err) {
  console.error("Embedding failed", err);
}

Prevention

•Use environment-driven endpoints and validate them at startup.
•Set explicit timeouts on every external client instead of relying on defaults.
•Add a health check that pings your LLM/embedding provider before serving traffic.
•Limit ingestion concurrency for local models and self-hosted gateways.
•Keep Docker/Kubernetes networking in mind; localhost only means “this container” when code runs inside a container.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit