LlamaIndex Tutorial (TypeScript): implementing retry logic for intermediate developers

By Cyprian AaronsUpdated 2026-04-21

llamaindeximplementing-retry-logic-for-intermediate-developerstypescript

This tutorial shows you how to add retry logic around LlamaIndex TypeScript calls so transient failures do not break your agent flow. You need this when working with flaky model APIs, rate limits, or downstream tool calls that fail intermittently and should be retried before surfacing an error.

What You'll Need

•Node.js 18+ and npm
•A TypeScript project with ts-node or a build step already set up
•llamaindex installed
•An OpenAI API key set in OPENAI_API_KEY
•Optional: a .env file if you prefer loading environment variables locally

Install the package:

npm install llamaindex
npm install -D typescript ts-node @types/node

Step-by-Step

•Start with a basic LlamaIndex query engine. The retry wrapper should sit outside your indexing code so you can reuse it for queries, tool calls, and any other model-backed operation.

import { Document, Settings, SimpleDirectoryReader, VectorStoreIndex } from "llamaindex";

async function main() {
  const docs = [
    new Document({ text: "LlamaIndex helps build retrieval augmented generation apps." }),
    new Document({ text: "Retry logic is useful for transient API failures and rate limits." }),
  ];

  Settings.chunkSize = 128;
  const index = await VectorStoreIndex.fromDocuments(docs);
  const queryEngine = index.asQueryEngine();

  const response = await queryEngine.query({
    query: "Why do we need retry logic?",
  });

  console.log(response.toString());
}

main().catch(console.error);

•Add a generic retry helper with exponential backoff. Keep it small and dependency-free so you can use it in serverless functions and long-running services without adding another abstraction layer.

type RetryOptions = {
  retries: number;
  baseDelayMs: number;
};

function sleep(ms: number) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

async function withRetry<T>(
  fn: () => Promise<T>,
  options: RetryOptions,
): Promise<T> {
  let lastError: unknown;

  for (let attempt = 0; attempt <= options.retries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error;
      if (attempt === options.retries) break;

      const delay = options.baseDelayMs * Math.pow(2, attempt);
      await sleep(delay);
    }
  }

  throw lastError;
}

•Wrap the LlamaIndex query call with that helper. This keeps your index construction separate from failure handling and makes the retry policy easy to tune per request type.

import { Document, Settings, VectorStoreIndex } from "llamaindex";

type RetryOptions = {
  retries: number;
  baseDelayMs: number;
};

function sleep(ms: number) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

async function withRetry<T>(
  fn: () => Promise<T>,
  options: RetryOptions,
): Promise<T> {
  let lastError: unknown;

  for (let attempt = 0; attempt <= options.retries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error;
      if (attempt === options.retries) break;
      await sleep(options.baseDelayMs * Math.pow(2, attempt));
    }
  }

  throw lastError;
}

async function main() {
  Settings.chunkSize = 128;

  const docs = [
    new Document({ text: "Retries protect against transient failures." }),
    new Document({ text: "Backoff reduces pressure on failing services." }),
  ];

  const index = await VectorStoreIndex.fromDocuments(docs);
  const queryEngine = index.asQueryEngine();

  const response = await withRetry(
    () =>
      queryEngine.query({
        query: "Explain why backoff matters.",
      }),
    { retries: 3, baseDelayMs: 250 },
  );

  console.log(response.toString());
}

main().catch(console.error);

•Make the retry policy smarter by only retrying transient errors. In production you do not want to retry validation errors or bad prompts; you want to retry rate limits, timeouts, and temporary upstream failures.

import { Document, Settings, VectorStoreIndex } from "llamaindex";

function isRetryableError(error: unknown): boolean {
  if (!(error instanceof Error)) return false;

  const message = error.message.toLowerCase();
  return (
    message.includes("rate limit") ||
    message.includes("timeout") ||
    message.includes("temporarily unavailable") ||
    message.includes("503")
  );
}

async function withRetry<T>(
  fn: () => Promise<T>,
): Promise<T> {
  let delayMs = 250;
  
   for (let attempt = 0; attempt < retries + ? ; attempt++) {}
}

•Use the helper in a real service-style function. This is the pattern you want in an agent backend: one place for retries, one place for business logic, and no repeated try/catch blocks scattered across your codebase.

import { Document, Settings, VectorStoreIndex } from "llamaindex";

type RetryOptions = {
	retries: number;
	baseDelayMs: number;
};

function sleep(ms: number) {
	return new Promise((resolve) => setTimeout(resolve, ms));
}

function isRetryableError(error: unknown): boolean {
	return error instanceof Error &&
		/rate limit|timeout|temporarily unavailable|503/i.test(error.message);
}

async function withRetry<T>(fn: () => Promise<T>, options: RetryOptions): Promise<T> {
	let lastError: unknown;

	for (let attempt = .0; attempt <= options.retries; attempt++) {}
}

Testing It

Run the script normally first and confirm you get a valid answer back from the query engine. Then simulate a failure by temporarily changing the query string or by pointing your model provider to an invalid endpoint so you can see the retry loop kick in.

If you want a cleaner test, wrap a fake async function that fails twice before succeeding and pass that into withRetry. You should see the final success only after the configured backoff delays have elapsed.

In production logs, verify two things:

•Retry attempts are visible with enough context to debug failures
•Non-retryable errors fail fast instead of burning time on useless attempts

Next Steps

•Add structured logging around each retry attempt with request IDs and latency
•Move the retry policy into a shared utility used by both LLM calls and tool execution
•Add jitter to your backoff so concurrent agents do not retry in lockstep

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit