How to Fix 'connection timeout when scaling' in LlamaIndex (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
connection-timeout-when-scalingllamaindextypescript

When you see connection timeout when scaling in a LlamaIndex TypeScript app, it usually means your index is trying to hit an external service during a burst of work and the request never completes before the client or upstream proxy gives up. In practice, this shows up during ingestion, query fan-out, or when your app scales from one request to many and starts opening too many concurrent connections.

In most cases, the problem is not LlamaIndex itself. It’s your transport layer, concurrency settings, or an embedding/LLM provider that can’t keep up.

The Most Common Cause

The #1 cause is uncontrolled concurrency while building or querying an index.

A common pattern is firing off many Promise.all() calls against VectorStoreIndex, OpenAIEmbedding, or a remote vector store. That works in local testing, then starts timing out under load because every request tries to scale at once.

Broken vs fixed

Broken patternFixed pattern
Unbounded parallel callsLimit concurrency and batch work
Recreating clients per requestReuse singleton clients
No timeout/retry policyExplicit timeout and retry config
// BROKEN: unbounded parallelism
import { VectorStoreIndex } from "llamaindex";

const docs = await Promise.all(
  files.map(async (file) => {
    return await readAndChunkFile(file);
  })
);

const index = await VectorStoreIndex.fromDocuments(docs);

// Later: fan-out queries
const answers = await Promise.all(
  questions.map((q) => index.asQueryEngine().query({ query: q }))
);
// FIXED: controlled batching + reusable index/query engine
import { VectorStoreIndex } from "llamaindex";
import pLimit from "p-limit";

const limit = pLimit(4);

const docs = [];
for (const file of files) {
  docs.push(await readAndChunkFile(file));
}

const index = await VectorStoreIndex.fromDocuments(docs);
const queryEngine = index.asQueryEngine();

const answers = await Promise.all(
  questions.map((q) =>
    limit(async () => {
      return await queryEngine.query({ query: q });
    })
  )
);

If you’re using an embedding model or remote vector DB, the same rule applies. Don’t let every incoming request create its own burst of network calls.

Other Possible Causes

1) HTTP client timeout is too low

If you’re using a provider behind a slow network path, the default timeout may be shorter than the actual response time.

import { OpenAI } from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  timeout: 10_000, // too low for heavy loads
});

Fix it by increasing timeout and adding retries where supported.

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  timeout: 60_000,
  maxRetries: 3,
});

2) Recreating LlamaIndex components on every request

If you instantiate Settings, embeddings, or vector stores inside a request handler, you pay connection setup costs repeatedly.

// BAD
export async function handler(req: Request) {
  const index = await VectorStoreIndex.fromDocuments(await loadDocs());
  return await index.asQueryEngine().query({ query: req.text() });
}

Move shared setup outside the hot path.

// GOOD
const docsPromise = loadDocs();

export async function handler(req: Request) {
  const docs = await docsPromise;
  const index = await VectorStoreIndex.fromDocuments(docs);
  const qe = index.asQueryEngine();
  return await qe.query({ query: req.text() });
}

3) Vector store connection pool exhaustion

This happens with Postgres/pgvector, Pinecone proxies, Redis-backed stores, or any backend with limited sockets.

// Example config issue: too many pooled connections
{
  max: 100,
  idleTimeoutMillis: 1000,
}

Reduce pool pressure and reuse clients.

{
  max: 10,
  idleTimeoutMillis: 30000,
}

Also make sure you are not creating a new pool per document batch.

4) Proxy / serverless limits

If this only happens in Vercel, Lambda, Cloud Run, or behind Nginx, the platform may be closing long-running requests.

Common symptoms include:

  • ETIMEDOUT
  • ECONNRESET
  • Request timed out after ... ms
  • upstream logs showing 502 or 504

Fix by moving ingestion to background jobs and keeping interactive queries short.

// BAD for serverless route handlers
await VectorStoreIndex.fromDocuments(bigDocumentSet);

Instead:

// GOOD: enqueue ingestion job
await queue.publish("ingest-documents", { batchId });
return new Response("Accepted", { status: 202 });

How to Debug It

  1. Check whether the timeout happens during ingestion or querying.
    If it fails in VectorStoreIndex.fromDocuments(), suspect embeddings or vector store writes. If it fails in .query(), suspect retrieval fan-out or provider latency.

  2. Log the exact error chain.
    Look for messages like:

    • Error: connection timeout when scaling
    • ETIMEDOUT
    • ECONNRESET
    • OpenAI API error
    • RetryError from your transport layer

    In TypeScript:

    try {
      await qe.query({ query });
    } catch (err) {
      console.error("query failed", err);
      throw err;
    }
    
  3. Disable parallelism temporarily.
    Run one document batch and one query at a time. If the error disappears, your issue is concurrency saturation, not bad data.

  4. Isolate each dependency.
    Test embedding generation alone, then vector store writes alone, then query retrieval alone.

    • If embeddings fail first, inspect provider timeout/rate limits.
    • If writes fail first, inspect DB pool size.
    • If queries fail first, inspect retriever top-k and downstream latency.

Prevention

  • Reuse long-lived clients for embeddings, LLMs, and vector stores. Don’t create them inside every handler.
  • Put hard limits on concurrency with p-limit, queues, or worker pools.
  • Separate ingestion from online query paths so large document loads don’t compete with user traffic.

If you want one rule to remember: don’t let LlamaIndex scale requests faster than your provider can answer them. That’s where this error comes from in real systems.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides