LangChain Tutorial (TypeScript): rate limiting API calls for intermediate developers

By Cyprian AaronsUpdated 2026-04-21

langchainrate-limiting-api-calls-for-intermediate-developerstypescript

This tutorial shows you how to rate limit LangChain API calls in TypeScript so your app stays under provider quotas and avoids burst failures. You’d use this when multiple users, background jobs, or retries can push you past OpenAI, Anthropic, or internal gateway limits.

What You'll Need

•Node.js 18+
•A TypeScript project with ts-node or a build step
•langchain
•@langchain/openai
•p-limit
•An API key for your LLM provider
•A .env file or environment variable setup

Install the packages:

npm install langchain @langchain/openai p-limit dotenv
npm install -D typescript ts-node @types/node

Step-by-Step

•Start by setting up a basic chat model and a limiter. The limiter controls how many requests can run at once, which is the simplest way to prevent spikes.

import "dotenv/config";
import pLimit from "p-limit";
import { ChatOpenAI } from "@langchain/openai";

const model = new ChatOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  model: "gpt-4o-mini",
});

const limit = pLimit(2);

•Wrap every model call in the limiter. This keeps concurrency under control even if your app tries to fire off ten requests at once.

async function ask(question: string) {
  return limit(async () => {
    const result = await model.invoke(question);
    return result.content;
  });
}

•If you need to process a batch of prompts, map them through the same wrapper. Promise.all is fine here because the limiter enforces the actual cap.

async function main() {
  const prompts = [
    "Write one sentence about risk management.",
    "Write one sentence about claims automation.",
    "Write one sentence about underwriting data.",
    "Write one sentence about fraud detection.",
  ];

  const results = await Promise.all(prompts.map((p) => ask(p)));
  console.log(results);
}

main().catch(console.error);

•Add retry handling for transient rate-limit errors. You still want backoff because some providers enforce per-minute quotas, not just concurrency caps.

async function askWithRetry(question: string, attempts = 3): Promise<string> {
  for (let i = 0; i < attempts; i++) {
    try {
      return await ask(question);
    } catch (err) {
      if (i === attempts - 1) throw err;
      const delay = 500 * Math.pow(2, i);
      await new Promise((r) => setTimeout(r, delay));
    }
  }
  throw new Error("unreachable");
}

•Put it together in a reusable utility so every route, worker, or agent tool uses the same policy. In production, this is where you centralize limits instead of scattering them across handlers.

import "dotenv/config";
import pLimit from "p-limit";
import { ChatOpenAI } from "@langchain/openai";

const model = new ChatOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  model: "gpt-4o-mini",
});

const limit = pLimit(2);

export async function rateLimitedInvoke(prompt: string) {
  return limit(async () => {
    const response = await model.invoke(prompt);
    return response.content;
  });
}

Testing It

Run the script with several prompts at once and watch the output order and timing. You should see only two requests executing concurrently if you kept pLimit(2).

To confirm it’s actually protecting you, temporarily lower your provider quota or send a larger batch. Without the limiter, you’ll get burst failures much faster.

If your provider returns HTTP 429s, keep the retry wrapper in place and log the delays between attempts. That tells you whether you’re dealing with concurrency pressure or true per-minute throttling.

Next Steps

•Add a token bucket limiter if you need requests-per-minute control instead of just concurrency control.
•Move the limiter into a shared service layer so all LangChain tools and agents use the same policy.
•Combine rate limiting with request queuing and observability so you can trace slowdowns before they hit production.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit