How to Fix 'timeout error' in LlamaIndex (TypeScript)

By Cyprian AaronsUpdated 2026-04-21
timeout-errorllamaindextypescript

When you see timeout error in LlamaIndex TypeScript, it usually means one of your async calls took longer than the configured timeout window and got aborted. In practice, this shows up during embedding, retrieval, LLM calls, or when you’re loading a large index from a slow data source.

The important part: this is rarely a “LlamaIndex bug”. It’s usually a timeout mismatch between your code, the model provider SDK, and the network path.

The Most Common Cause

The #1 cause is wrapping LlamaIndex calls in an overly aggressive Promise.race() / request timeout, or using a provider client with a shorter timeout than the actual LLM call needs.

A common failure looks like this:

import { OpenAI } from "@llamaindex/openai";
import { VectorStoreIndex } from "llamaindex";

const llm = new OpenAI({
  model: "gpt-4o-mini",
  timeout: 5000, // too short for real workloads
});

const index = await VectorStoreIndex.fromDocuments(documents);

const queryEngine = index.asQueryEngine();

const response = await queryEngine.query({
  query: "Summarize the policy exclusions",
});

And in logs you’ll typically see something like:

Error: Request timed out.
    at OpenAI.chat.completions.create (...)
    at QueryEngine.query (...)

Or from the runtime:

DOMException [AbortError]: The operation was aborted.

The fix is to set timeouts at the right layer and give long-running operations enough room.

Broken patternFixed pattern
timeout: 5000 on the LLM clientUse a realistic timeout, e.g. timeout: 60000
No retry policyAdd retries for transient slow responses
Wrapping every call in a hard 5s abortOnly use hard aborts for user-facing SLA boundaries
import { OpenAI } from "@llamaindex/openai";
import { VectorStoreIndex } from "llamaindex";

const llm = new OpenAI({
  model: "gpt-4o-mini",
  timeout: 60000,
  maxRetries: 2,
});

const index = await VectorStoreIndex.fromDocuments(documents);

const queryEngine = index.asQueryEngine({
  llm,
});

const response = await queryEngine.query({
  query: "Summarize the policy exclusions",
});

If you need a user-facing timeout, keep it outside the LlamaIndex client and make it generous enough for your worst-case query path.

Other Possible Causes

1) Embedding batch is too large

If you ingest too many documents in one shot, embedding requests can stall or exceed provider limits.

// Bad: huge batch
await index.insertNodes(largeNodeBatch);

Fix by chunking ingestion:

for (let i = 0; i < largeNodeBatch.length; i += 50) {
  await index.insertNodes(largeNodeBatch.slice(i, i + 50));
}

2) Slow vector store connection

If your vector DB is remote and slow, VectorStoreIndex.fromDocuments() can look like an LLM timeout even though it’s actually storage latency.

const storageContext = await StorageContext.fromDefaults({
  vectorStore,
});

Check:

  • network latency to Pinecone/Qdrant/Weaviate
  • cold starts in serverless environments
  • DNS delays in private VPC setups

3) Model provider rate limiting masquerading as timeout

Some providers return slow responses under load instead of cleanly failing fast.

Error: Request timed out after waiting for response headers.

Add retry/backoff and reduce concurrency:

const llm = new OpenAI({
  model: "gpt-4o-mini",
  timeout: 60000,
  maxRetries: 3,
});

Also avoid firing many parallel queries against the same provider key.

4) Node.js runtime aborts the request

If you’re using AbortController, your own code may be cancelling the request before LlamaIndex finishes.

const controller = new AbortController();

setTimeout(() => controller.abort(), 3000);

await queryEngine.query(
  { query: "Explain claim denial reasons" },
  { signal: controller.signal }
);

That’s fine for UI interactions, but not for indexing jobs or long retrieval chains. Increase the abort window or remove it from background jobs.

How to Debug It

  1. Find where the timeout is happening

    • Is it during fromDocuments(), embedding, retrieval, or final generation?
    • Add logs before each step so you know which phase stalls.
  2. Print the exact error text

    • Look for:
      • Request timed out
      • AbortError
      • ETIMEDOUT
      • provider-specific messages like OpenAI API request timed out
    • The message tells you whether this is client-side aborting or upstream slowness.
  3. Increase only one timeout at a time

    • First increase the LLM client timeout.
    • Then check embedding timeouts.
    • Then check any app-level AbortController.
    • If the error disappears after one change, you found the layer.
  4. Test against smaller input

    • Run with:
      • fewer documents
      • smaller chunks
      • lower concurrency
    • If small inputs work and large ones fail, it’s almost always batching or provider latency.

Prevention

  • Set explicit timeouts per layer:

    • LLM client timeout
    • embedding timeout if supported by your SDK
    • app-level request timeout only for user-facing endpoints
  • Keep ingestion batches small:

    • chunk documents before embedding
    • avoid parallelizing everything by default
  • Log timing around each stage:

    console.time("ingest");
    await VectorStoreIndex.fromDocuments(documents);
    console.timeEnd("ingest");
    

If you’re building production agents on TypeScript with LlamaIndex, treat timeouts as an integration problem, not just an exception to catch. Fix the slow layer, then add retries and sane defaults so it doesn’t come back in production.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides