How to Fix 'connection timeout in production' in LlamaIndex (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
connection-timeout-in-productionllamaindextypescript

When you see connection timeout in production with LlamaIndex TypeScript, it usually means your app is trying to reach an external service and the request never completes before the network or client timeout kicks in. In practice, this shows up when calling OpenAI, Azure OpenAI, Anthropic, a vector DB, or any custom LLM endpoint from a deployed Node.js app.

The key point: this is usually not a “LlamaIndex bug”. It’s almost always a network, runtime, or client configuration issue around OpenAI, OpenAIEmbedding, VectorStoreIndex, or your transport layer.

The Most Common Cause

The #1 cause is creating a client with the wrong timeout assumptions for production. In local dev, requests may succeed because latency is low and your machine has no proxy/NAT constraints. In production, the same call can fail with errors like:

  • Error: Request timed out
  • FetchError: request to https://api.openai.com/v1/chat/completions failed
  • TimeoutError: Connection timed out
  • LlamaIndexError: Failed to create index after an upstream timeout

The broken pattern is usually one of these:

  • no explicit timeout
  • too-short timeout
  • creating a new client per request
  • running long index construction inside an HTTP request handler

Broken vs fixed

BrokenFixed
Client created inside handlerClient reused across requests
Default/short timeoutExplicit timeout tuned for prod
Heavy indexing during request pathPrecompute or queue background work
// ❌ Broken: new client per request, no control over timeout
import { OpenAI } from "llamaindex";
import { VectorStoreIndex } from "llamaindex";

export async function POST(req: Request) {
  const body = await req.json();

  const llm = new OpenAI({
    model: "gpt-4o-mini",
    apiKey: process.env.OPENAI_API_KEY,
  });

  // This can easily time out in production if the dataset is large
  const index = await VectorStoreIndex.fromDocuments(body.documents, {
    llm,
  });

  return Response.json({ ok: true });
}
// ✅ Fixed: reuse clients and keep indexing out of the hot path
import { OpenAI } from "llamaindex";
import { VectorStoreIndex } from "llamaindex";

const llm = new OpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
  // If your environment supports it, set a longer transport timeout here
});

export async function POST(req: Request) {
  const body = await req.json();

  // Better: enqueue this job or run it in a worker
  const index = await VectorStoreIndex.fromDocuments(body.documents, {
    llm,
  });

  return Response.json({ ok: true });
}

If you’re building an API route, don’t build indexes synchronously unless the document set is tiny. Production timeouts often come from doing too much work inside one request.

Other Possible Causes

1) Outbound network restrictions

Your server may not have egress access to the provider. This is common on VPCs, Kubernetes clusters, corporate networks, and serverless environments with strict NAT rules.

# Check basic connectivity from the runtime host
curl -I https://api.openai.com/v1/models

If that hangs or fails, the issue is below LlamaIndex. Fix security groups, NAT gateway rules, proxy settings, or DNS.

2) Missing proxy configuration

If your environment requires an HTTP proxy and Node isn’t using it, requests can stall until they time out.

process.env.HTTP_PROXY = "http://proxy.internal:8080";
process.env.HTTPS_PROXY = "http://proxy.internal:8080";

For some deployments, you also need to ensure your fetch implementation honors these variables.

3) Cold starts plus short serverless timeouts

In AWS Lambda, Vercel Functions, Cloud Run, or Azure Functions, cold starts can eat most of your budget. If LlamaIndex then needs to call embeddings plus retrieval plus generation, the request dies before completion.

export const maxDuration = 60; // platform-specific example

// Keep handler work small and push heavy jobs async.

If you see timeouts only on first request after idle periods, this is likely the problem.

4) Slow vector store or embedding pipeline

Sometimes the LLM call is fine and the real bottleneck is embeddings or storage. A slow Pinecone/Qdrant/Weaviate/MongoDB Atlas Vector Search call can surface as a generic timeout in your app logs.

import { OpenAIEmbedding } from "llamaindex";

const embedModel = new OpenAIEmbedding({
  model: "text-embedding-3-small",
  apiKey: process.env.OPENAI_API_KEY,
});

If embeddings are slow under load, batch them and reduce concurrency.

How to Debug It

  1. Isolate the failing hop

    • Test provider connectivity first.
    • Then test embeddings.
    • Then test retrieval.
    • Then test full index creation.
  2. Add timing around each step

    const t0 = Date.now();
    await someLlamaIndexCall();
    console.log("step took", Date.now() - t0);
    

    If one step spikes to tens of seconds, that’s your culprit.

  3. Run outside the web request path

    • Move the same code into a script.
    • Run it on the same host/container.
    • If it works there but not in the API route, you likely hit serverless limits or request timeouts.
  4. Inspect raw network errors Look for:

    • ETIMEDOUT
    • ECONNRESET
    • ENOTFOUND
    • fetch failed
    • upstream provider status incidents

If you only see LlamaIndexError without root cause detail, wrap calls and log the underlying exception object before rethrowing.

Prevention

  • Reuse singleton clients for OpenAI, embedding models, and vector store connections.
  • Keep index construction out of synchronous request handlers; use background jobs for ingestion.
  • Set explicit timeouts at both layers:
    • platform function timeout
    • HTTP/client timeout to external APIs
  • Add health checks that verify outbound access to your LLM provider before deploying traffic.

If you want one rule to remember: don’t build indexes inline in production unless you’ve measured it under real latency and load. That’s where “connection timeout” starts showing up.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides