LangChain Tutorial (TypeScript): rate limiting API calls for beginners

By Cyprian AaronsUpdated 2026-04-21
langchainrate-limiting-api-calls-for-beginnerstypescript

This tutorial shows you how to add rate limiting to LangChain API calls in TypeScript so your app stops hammering providers and starts behaving like production software. You need this when you’re hitting OpenAI, Anthropic, or any other LLM API that enforces request quotas, burst limits, or per-minute caps.

What You'll Need

  • Node.js 18+
  • TypeScript 5+
  • A LangChain TypeScript project
  • langchain installed
  • @langchain/openai installed
  • An API key for your model provider
  • A .env file for secrets
  • Basic familiarity with async/await and LangChain ChatOpenAI

Step-by-Step

  1. Install the packages and set up your environment. We’ll use a simple token-bucket style limiter in code, then wrap every model call through it.
npm install langchain @langchain/openai dotenv
npm install -D typescript tsx @types/node
  1. Add your API key to .env, then create a small rate limiter class. This version limits requests per minute and queues callers instead of failing immediately.
import "dotenv/config";

class RateLimiter {
  private tokens: number;
  private lastRefill: number;

  constructor(
    private readonly capacity: number,
    private readonly refillPerMinute: number
  ) {
    this.tokens = capacity;
    this.lastRefill = Date.now();
  }

  private refill() {
    const now = Date.now();
    const elapsedMinutes = (now - this.lastRefill) / 60000;
    const refillAmount = elapsedMinutes * this.refillPerMinute;
    this.tokens = Math.min(this.capacity, this.tokens + refillAmount);
    this.lastRefill = now;
  }

  async acquire(): Promise<void> {
    while (true) {
      this.refill();
      if (this.tokens >= 1) {
        this.tokens -= 1;
        return;
      }
      await new Promise((resolve) => setTimeout(resolve, 250));
    }
  }
}
  1. Create your LangChain chat model and a helper that always waits for the limiter before calling the provider. This keeps rate limiting outside the chain logic, which is where it belongs.
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage } from "@langchain/core/messages";

const limiter = new RateLimiter(5, 5);

const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

async function limitedInvoke(prompt: string) {
  await limiter.acquire();
  const response = await model.invoke([new HumanMessage(prompt)]);
  return response.content;
}
  1. Use the helper in a loop or inside your app workflow. If you send multiple requests at once, they will queue behind the limiter instead of firing all at once.
async function main() {
  const prompts = [
    "Write one sentence about insurance fraud detection.",
    "Write one sentence about claims automation.",
    "Write one sentence about underwriting risk.",
    "Write one sentence about policy servicing.",
    "Write one sentence about customer retention.",
    "Write one sentence about document extraction."
  ];

  for (const prompt of prompts) {
    const start = Date.now();
    const output = await limitedInvoke(prompt);
    console.log(`[${Date.now() - start}ms] ${output}`);
  }
}

main().catch(console.error);
  1. If you want better control in production, add retry handling for provider throttling errors. Rate limiting on your side reduces failures, but APIs can still reject requests when shared limits are hit.
async function limitedInvokeWithRetry(prompt: string) {
  for (let attempt = 1; attempt <= 3; attempt++) {
    try {
      await limiter.acquire();
      const response = await model.invoke([new HumanMessage(prompt)]);
      return response.content;
    } catch (error) {
      if (attempt === 3) throw error;
      await new Promise((resolve) => setTimeout(resolve, attempt * 1000));
    }
  }
}

Testing It

Run the script with npx tsx index.ts and watch the timestamps between responses. If your limiter is set to 5 requests per minute, the first few calls should go through immediately and later calls should pause until tokens refill.

To test burst behavior, change capacity to 2 and send six prompts in parallel with Promise.all. You should see some requests wait instead of all hitting the API at once.

If you get authentication errors, check that OPENAI_API_KEY is loaded from .env. If you still see rate-limit errors from the provider, lower your local request rate or increase backoff delays.

Next Steps

  • Add token-based limiting instead of request-based limiting
  • Move the limiter into a shared service so multiple workers respect the same quota
  • Combine this with LangChain retry middleware and circuit breaking

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides