LlamaIndex Tutorial (TypeScript): implementing retry logic for advanced developers

By Cyprian AaronsUpdated 2026-04-21
llamaindeximplementing-retry-logic-for-advanced-developerstypescript

This tutorial shows how to add deterministic retry logic around LlamaIndex TypeScript calls so transient failures don’t break your agent or RAG workflow. You need this when you’re calling flaky LLM endpoints, rate-limited APIs, or external tools that fail intermittently and you want controlled backoff instead of random app crashes.

What You'll Need

  • Node.js 18+
  • A TypeScript project with ts-node or a build step
  • @llamaindex/core
  • @llamaindex/openai
  • An OpenAI API key in OPENAI_API_KEY
  • Optional: dotenv if you want to load environment variables from a .env file

Install the packages:

npm install @llamaindex/core @llamaindex/openai
npm install -D typescript ts-node @types/node

If you use a .env file:

npm install dotenv

Step-by-Step

  1. Start with a minimal LlamaIndex setup that uses a real OpenAI-backed query engine. The retry wrapper will sit outside LlamaIndex so you can reuse it for query engines, tool calls, and custom functions.
import "dotenv/config";
import { Settings, VectorStoreIndex } from "@llamaindex/core";
import { OpenAI } from "@llamaindex/openai";

Settings.llm = new OpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

const docs = [
  "LlamaIndex is used to build retrieval-augmented generation pipelines.",
  "Retry logic should handle transient failures like rate limits and timeouts.",
];

const index = await VectorStoreIndex.fromDocuments(docs);
const queryEngine = index.asQueryEngine();

const response = await queryEngine.query({
  query: "What is LlamaIndex used for?",
});

console.log(response.toString());
  1. Add a reusable retry helper with exponential backoff and jitter. This keeps the policy explicit and makes it easy to tune for production traffic patterns.
type RetryOptions = {
  retries: number;
  baseDelayMs: number;
  maxDelayMs: number;
};

function sleep(ms: number) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

function isRetryableError(error: unknown): boolean {
  const message = error instanceof Error ? error.message : String(error);
  return /rate limit|timeout|temporarily unavailable|ECONNRESET|503|429/i.test(
    message,
  );
}

async function withRetry<T>(
  fn: () => Promise<T>,
  options: RetryOptions,
): Promise<T> {
  let lastError: unknown;

  for (let attempt = 0; attempt <= options.retries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error;
      if (attempt === options.retries || !isRetryableError(error)) throw error;

      const delay = Math.min(
        options.maxDelayMs,
        options.baseDelayMs * 2 ** attempt,
      );
      const jitter = Math.floor(Math.random() * delay * 0.25);
      await sleep(delay + jitter);
    }
  }

  throw lastError;
}
  1. Wrap your LlamaIndex query call with the helper. This pattern works the same way for any async method returned by an index, retriever, or agent workflow.
import "dotenv/config";
import { Settings, VectorStoreIndex } from "@llamaindex/core";
import { OpenAI } from "@llamaindex/openai";

Settings.llm = new OpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

const docs = [
  "Retries should be bounded and only applied to transient failures.",
  "Do not retry validation errors or bad prompts.",
];

const index = await VectorStoreIndex.fromDocuments(docs);
const queryEngine = index.asQueryEngine();

type RetryOptions = {
  retries: number;
  baseDelayMs: number;
  maxDelayMs: number;
};

function sleep(ms: number) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

function isRetryableError(error: unknown): boolean {
  const message = error instanceof Error ? error.message : String(error);
  return /rate limit|timeout|temporarily unavailable|ECONNRESET|503|429/i.test(
    message,
  );
}

async function withRetry<T>(fn: () => Promise<T>, options: RetryOptions) {
  for (let attempt = 0; attempt <= options.retries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (attempt === options.retries || !isRetryableError(error)) throw error;
      const delay = Math.min(options.maxDelayMs, options.baseDelayMs * 2 ** attempt);
      await sleep(delay + Math.floor(Math.random() * delay * 0.25));
    }
  }
}

const result = await withRetry(
  () =>
    queryEngine.query({
      query: "What should not be retried?",
    }),
  { retries: 3, baseDelayMs: 250, maxDelayMs: 4000 },
);

console.log(result.toString());
  1. Apply the same wrapper to tool-like functions when your agent depends on external systems. This is where retry logic usually matters most because databases, internal HTTP services, and vendor APIs fail more often than the model call itself.
type CustomerLookup = {
  customerId: string;
};

async function fetchCustomerRiskScore(input: CustomerLookup): Promise<number> {
  const response = await fetch(
    `https://example.internal/risk/${input.customerId}`,
    {
      headers: { Accept: "application/json" },
    },
  );

  if (!response.ok) {
    throw new Error(`Risk service failed with ${response.status}`);
    }

  const data = (await response.json()) as { score: number };
  return data.score;
}

const riskScore = await withRetry(
  () => fetchCustomerRiskScore({ customerId: "cust_123" }),
  { retries: 5, baseDelayMs: 200, maxDelayMs: 5000 },
);

console.log({ riskScore });
  1. Add a failure policy so you do not retry everything forever. In production systems, bad prompts, schema violations, and auth errors should fail fast while only transient network issues get another chance.
function classifyError(error: unknown) {
  

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides