How to Fix 'connection timeout' in LlamaIndex (TypeScript)
A connection timeout in LlamaIndex TypeScript usually means your app tried to reach an upstream service — OpenAI, Azure OpenAI, Ollama, a vector DB, or an internal API — and the request never got a response before the socket timed out. In practice, this shows up during index.insert(), queryEngine.query(), embedding generation, or when initializing an LLM client.
The key thing: this is usually not a “LlamaIndex bug.” It’s almost always a network, client config, or runtime issue around OpenAI, OpenAIEmbedding, VectorStoreIndex, or whatever provider you plugged in.
The Most Common Cause
The #1 cause I see is using the wrong base URL or pointing the client at a service that is not actually reachable from the runtime where your Node process runs.
Typical error shape:
Error: request to http://localhost:11434/v1/chat/completions failed, reason: connect ECONNREFUSED 127.0.0.1:11434
Or:
Error: Request timed out.
at OpenAI.chat.completions.create (...)
at LLM.complete (...)
at VectorStoreIndex.asQueryEngine (...)
Here’s the broken pattern versus the fixed one.
| Broken | Fixed |
|---|---|
Uses localhost from inside Docker or a remote server | Uses a reachable host name or container service name |
| Assumes Ollama/OpenAI-compatible server is running locally | Verifies the service is up before LlamaIndex starts |
| No timeout tuning or retry strategy | Explicit timeout + sane endpoint config |
// Broken
import { Settings, OpenAI } from "llamaindex";
Settings.llm = new OpenAI({
model: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
// Wrong if your app runs in Docker/CI/remote VM and localhost is not that machine
baseURL: "http://localhost:11434/v1",
});
const response = await Settings.llm.complete("Summarize this document.");
console.log(response);
// Fixed
import { Settings, OpenAI } from "llamaindex";
Settings.llm = new OpenAI({
model: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
// Use the actual reachable endpoint
baseURL: process.env.LLM_BASE_URL ?? "https://api.openai.com/v1",
timeout: 60_000,
});
const response = await Settings.llm.complete("Summarize this document.");
console.log(response);
If you’re using Ollama in Docker, don’t point your app container at localhost. Use the service name from your compose network:
baseURL: "http://ollama:11434/v1"
If you’re on Kubernetes, use the cluster DNS name of the service. If you’re on a laptop but testing through a VPN or proxy, confirm that route first.
Other Possible Causes
1) The upstream model provider is slow or rate-limiting you
A slow first token can look like a timeout when your request window is too tight.
import { OpenAI } from "llamaindex";
const llm = new OpenAI({
model: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
timeout: 10_000, // too aggressive for large prompts / cold starts
});
Fix it by increasing timeout and reducing prompt size:
const llm = new OpenAI({
model: "gpt-4o-mini",
apiKey: process.env.OPENAI_API_KEY,
timeout: 60_000,
});
2) Your embedding call is timing out, not the chat call
This happens when VectorStoreIndex.fromDocuments() hangs because embeddings are slow or unreachable.
import { Settings, OpenAIEmbedding } from "llamaindex";
Settings.embedModel = new OpenAIEmbedding({
model: "text-embedding-3-small",
apiKey: process.env.OPENAI_API_KEY,
});
If embedding requests are failing, test them directly before building an index:
const embedding = await Settings.embedModel.getTextEmbedding("test text");
console.log(embedding.length);
3) Your proxy or firewall blocks outbound traffic
In corporate networks, Node can resolve DNS but still fail on TLS handshake or outbound ports.
curl -v https://api.openai.com/v1/models
If that fails from the same machine/container where Node runs, LlamaIndex will fail too. Set proxy variables if required:
export HTTPS_PROXY=http://proxy.company.local:8080
export HTTP_PROXY=http://proxy.company.local:8080
4) You are creating too many concurrent requests
Parallel document ingestion can overwhelm local services like Ollama or small internal gateways.
// Risky if docs is large and your backend is weak
await Promise.all(
docs.map((doc) => index.insert(doc))
);
Throttle it:
for (const doc of docs) {
await index.insert(doc);
}
Or use a concurrency limiter if you need throughput.
How to Debug It
- •
Test the upstream endpoint outside LlamaIndex.
Usecurlagainst the exact URL your app uses. Ifcurltimes out, fix networking first. - •
Log the resolved client config.
PrintbaseURL, model name, and any proxy env vars before creatingOpenAIorOpenAIEmbedding. - •
Isolate which call fails.
Check whether it breaks on:- •
Settings.llm.complete(...) - •
Settings.embedModel.getTextEmbedding(...) - •
VectorStoreIndex.fromDocuments(...) - •
queryEngine.query(...)
- •
- •
Increase timeout and reduce payload size.
If smaller prompts work but larger ones fail, you’re hitting latency limits rather than connectivity issues.
Example diagnostic split:
try {
const text = await Settings.llm.complete("ping");
console.log("LLM ok", text.toString());
} catch (err) {
console.error("LLM failed", err);
}
try {
const emb = await Settings.embedModel.getTextEmbedding("ping");
console.log("Embedding ok", emb.length);
} catch (err) {
console.error("Embedding failed", err);
}
Prevention
- •Use environment-driven endpoints and validate them at startup.
- •Set explicit timeouts on every external client instead of relying on defaults.
- •Add a health check that pings your LLM/embedding provider before serving traffic.
- •Limit ingestion concurrency for local models and self-hosted gateways.
- •Keep Docker/Kubernetes networking in mind;
localhostonly means “this container” when code runs inside a container.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit