Haystack Tutorial (TypeScript): rate limiting API calls for advanced developers

By Cyprian AaronsUpdated 2026-04-21
haystackrate-limiting-api-calls-for-advanced-developerstypescript

This tutorial shows you how to put a hard rate limit in front of Haystack API calls in TypeScript, so your agent stops hammering upstream services when traffic spikes or retries pile up. You need this when you’re calling models, search APIs, or internal banking services that enforce quotas, charge per request, or start failing under burst load.

What You'll Need

  • Node.js 18+ and npm
  • A TypeScript project with ts-node or a build step
  • haystack installed in your project
  • An API key for the upstream service you’re calling
  • A working Haystack pipeline or component graph already set up
  • Optional but recommended:
    • p-limit for concurrency control
    • bottleneck if you want distributed or token-bucket style throttling

Step-by-Step

  1. Start with a simple Haystack pipeline and a rate-limited wrapper around the actual network call. The key idea is to keep Haystack orchestration separate from the limiter so you can reuse it across tools, retrievers, and generators.
import { Pipeline } from "haystack";
import Bottleneck from "bottleneck";

const limiter = new Bottleneck({
  maxConcurrent: 2,
  minTime: 250,
});

async function callUpstream(prompt: string): Promise<string> {
  const response = await fetch("https://api.example.com/generate", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${process.env.API_KEY}`,
    },
    body: JSON.stringify({ prompt }),
  });

  if (!response.ok) {
    throw new Error(`Upstream failed: ${response.status}`);
  }

  const data = (await response.json()) as { text: string };
  return data.text;
}
  1. Wrap every upstream invocation with the limiter. This is where you actually enforce the policy, not inside Haystack itself.
async function limitedCall(prompt: string): Promise<string> {
  return limiter.schedule(() => callUpstream(prompt));
}

async function main() {
  const prompts = [
    "Summarize policy A",
    "Summarize policy B",
    "Summarize policy C",
    "Summarize policy D",
  ];

  const results = await Promise.all(
    prompts.map((prompt) => limitedCall(prompt))
  );

  console.log(results);
}

void main();
  1. If you’re using Haystack components, place the limiter at the boundary where external calls happen. That gives you one choke point for retries, observability, and backoff.
import { Component } from "haystack";

class RateLimitedGenerator extends Component {
  async run(input: { prompt: string }): Promise<{ output: string }> {
    const output = await limiter.schedule(() => callUpstream(input.prompt));
    return { output };
  }
}

const generator = new RateLimitedGenerator();

async function runPipelineLikeFlow() {
  const a = await generator.run({ prompt: "Draft an email" });
  const b = await generator.run({ prompt: "Rewrite it for compliance" });

  console.log(a.output);
  console.log(b.output);
}

void runPipelineLikeFlow();
  1. Add retry logic only after rate limiting is in place. Without a limiter, retries can multiply traffic and make throttling worse.
async function retryLimitedCall(
  prompt: string,
  attempts = 3
): Promise<string> {
  let lastError: unknown;

  for (let i = 0; i < attempts; i++) {
    try {
      return await limitedCall(prompt);
    } catch (error) {
      lastError = error;
      await new Promise((resolve) => setTimeout(resolve, 200 * (i + 1)));
    }
  }

  throw lastError;
}
  1. For batch workloads, queue requests explicitly and keep concurrency low. This is usually better than firing off hundreds of promises and hoping the limiter sorts it out later.
async function batchProcess(prompts: string[]): Promise<string[]> {
  const outputs: string[] = [];

  for (const prompt of prompts) {
    const result = await retryLimitedCall(prompt);
    outputs.push(result);
  }

  return outputs;
}

async function runBatch() {
  const docs = ["doc-1", "doc-2", "doc-3"];
  const outputs = await batchProcess(docs);

  console.log(outputs);
}

void runBatch();

Testing It

Run the script with a burst of requests and watch the spacing between upstream calls. If you set minTime to 250, you should never see more than four requests per second from that process. Also check that failures don’t trigger uncontrolled retries; each retry should still pass through the same limiter.

If your upstream service returns 429 Too Many Requests, confirm that your app slows down instead of immediately reissuing all pending calls. In production, add logs around queue length, wait time, and retry count so you can prove the limiter is doing real work.

Next Steps

  • Add Redis-backed distributed rate limiting if you run multiple workers
  • Combine this with circuit breaking so repeated upstream failures stop new traffic early
  • Export metrics for wait time, queue depth, and upstream status codes

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides