Haystack Tutorial (TypeScript): rate limiting API calls for beginners

By Cyprian AaronsUpdated 2026-04-21

haystackrate-limiting-api-calls-for-beginnerstypescript

This tutorial shows how to add rate limiting around API calls in a Haystack TypeScript pipeline using a simple token-bucket style guard. You need this when your app can trigger too many LLM or search requests at once, and you want to avoid 429s, quota burn, or noisy retries.

What You'll Need

•Node.js 18+
•TypeScript 5+
•A Haystack TypeScript project already set up
•@haystack/core installed
•An API key for the service you’re calling, if you plan to test against a real endpoint
•ts-node or a build step via tsc

Install the package if you haven’t already:

npm install @haystack/core
npm install -D typescript ts-node @types/node

Step-by-Step

•Start with a small rate limiter that controls how often your code is allowed to call an upstream API. This version uses a fixed interval between requests, which is enough for beginners and easy to reason about.

class RateLimiter {
  private lastCallAt = 0;

  constructor(private readonly minIntervalMs: number) {}

  async waitTurn(): Promise<void> {
    const now = Date.now();
    const elapsed = now - this.lastCallAt;
    const delay = Math.max(0, this.minIntervalMs - elapsed);

    if (delay > 0) {
      await new Promise((resolve) => setTimeout(resolve, delay));
    }

    this.lastCallAt = Date.now();
  }
}

•Wrap your actual API call in a function that waits before each request. In Haystack apps, this is usually where you call an LLM component, a retriever, or any external service.

import { pipeline } from "@haystack/core";

const limiter = new RateLimiter(1000);

async function callApi(prompt: string): Promise<string> {
  await limiter.waitTurn();

  // Replace this with your real Haystack component call.
  // Example: pipeline.run(...) or client request logic.
  return `Processed: ${prompt}`;
}

async function main() {
  const result = await callApi("Check policy renewal status");
  console.log(result);
}

void main();

•If you are calling the same endpoint from multiple places, centralize the limiter so every request shares the same queue. That prevents one part of your app from bypassing the limit and causing bursts.

class ApiClient {
  constructor(private readonly limiter: RateLimiter) {}

  async fetchAnswer(input: string): Promise<string> {
    await this.limiter.waitTurn();

    // Swap this stub for your Haystack pipeline or remote API call.
    return `Answer for: ${input}`;
  }
}

const sharedLimiter = new RateLimiter(500);
const client = new ApiClient(sharedLimiter);

async function runBatch() {
  const inputs = ["A", "B", "C"];
  const outputs = [];

  for (const input of inputs) {
    outputs.push(await client.fetchAnswer(input));
  }

  console.log(outputs);
}

void runBatch();

•Add retry handling for transient failures like HTTP 429 or timeouts. Rate limiting reduces pressure, but production code still needs retries with backoff because providers can throttle you anyway.

async function retry<T>(
  fn: () => Promise<T>,
  attempts = 3,
): Promise<T> {
  let lastError: unknown;

  for (let i = 1; i <= attempts; i++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error;
      if (i < attempts) {
        await new Promise((resolve) => setTimeout(resolve, i * 500));
      }
    }
  }

  throw lastError;
}

•Combine the limiter and retry wrapper around each upstream request. This gives you predictable pacing plus resilience when the provider rejects a request anyway.

async function guardedCall(input: string): Promise<string> {
  return retry(async () => {
    await limiter.waitTurn();

    // Real usage:
    // return await myPipeline.run({ query: input });

    if (input === "fail-once" && Math.random() < 0.5) {
      throw new Error("429 Too Many Requests");
    }

    return `OK: ${input}`;
  });
}

async function demo() {
  const results = await Promise.all([
    guardedCall("first"),
    guardedCall("second"),
    guardedCall("third"),
  ]);

  console.log(results);
}

void demo();

Testing It

Run the script and watch the timestamps or log output between calls. If your limiter is set to 1000, each request should be spaced by about one second instead of firing all at once.

To verify retry behavior, temporarily force an error in the guarded function and confirm it retries before failing. In a real Haystack integration, swap the stubbed return values with your pipeline’s run() call and confirm you stop seeing bursty traffic and avoid provider throttling.

If you want stronger proof, log Date.now() before and after each waitTurn() call and check that concurrent callers still serialize through the shared limiter. That’s the main thing beginners miss: rate limiting only works if every path uses the same guard.

Next Steps

•Replace the fixed-delay limiter with a token bucket so you can allow short bursts without exceeding your quota.
•Add per-provider limits if your app talks to multiple APIs with different quotas.
•Move the limiter into a reusable service class and inject it into your Haystack pipeline wrappers instead of calling it directly from business logic.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit