LangChain Tutorial (TypeScript): rate limiting API calls for advanced developers
This tutorial shows you how to put hard rate limits around LangChain API calls in TypeScript, so your app stops blowing through provider quotas and starts behaving predictably under load. You need this when multiple requests, retries, or parallel chains can trigger burst traffic that gets you throttled, billed unexpectedly, or blocked by upstream APIs.
What You'll Need
- •Node.js 18+
- •TypeScript 5+
- •An OpenAI API key in
OPENAI_API_KEY - •Packages:
- •
langchain - •
@langchain/openai - •
p-limit - •
dotenv
- •
- •A project configured for ES modules or TypeScript compilation
- •Basic familiarity with LangChain chat models and async/await
Step-by-Step
- •Install the dependencies and set up environment variables.
The key piece here isp-limit, which gives you concurrency control. That does not replace provider-side rate limits, but it stops your own code from creating traffic spikes.
npm install langchain @langchain/openai p-limit dotenv
npm install -D typescript tsx @types/node
OPENAI_API_KEY=your_openai_key_here
- •Create a small rate limiter wrapper around your model calls.
This example limits both concurrency and request spacing. In practice, this is the safest pattern because it protects you from bursts even if several chains fire at once.
import "dotenv/config";
import pLimit from "p-limit";
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage } from "@langchain/core/messages";
const model = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0,
});
const limit = pLimit(2);
let lastCallAt = 0;
const minIntervalMs = 1000;
async function waitForSlot() {
const now = Date.now();
const elapsed = now - lastCallAt;
if (elapsed < minIntervalMs) {
await new Promise((resolve) => setTimeout(resolve, minIntervalMs - elapsed));
}
lastCallAt = Date.now();
}
- •Wrap every LangChain call with the limiter.
This is where the protection actually happens. Any code path that calls the provider should go through the same wrapper, otherwise one forgotten call site will bypass your controls.
async function limitedInvoke(prompt: string) {
return limit(async () => {
await waitForSlot();
return model.invoke([new HumanMessage(prompt)]);
});
}
async function main() {
const results = await Promise.all([
limitedInvoke("Write one sentence about banking risk."),
limitedInvoke("Write one sentence about insurance claims."),
limitedInvoke("Write one sentence about fraud detection."),
limitedInvoke("Write one sentence about KYC."),
]);
for (const result of results) {
console.log(result.content);
}
}
main().catch(console.error);
- •Add retry handling for real-world throttling responses.
Rate limiting on your side reduces pressure, but providers still return429when their quota or burst rules are hit. Use bounded retries with backoff so your app recovers instead of failing immediately.
async function retry<T>(fn: () => Promise<T>, attempts = 3): Promise<T> {
let delayMs = 500;
for (let i = 1; i <= attempts; i++) {
try {
return await fn();
} catch (err: any) {
const status = err?.status ?? err?.response?.status;
if (status !== 429 || i === attempts) throw err;
await new Promise((resolve) => setTimeout(resolve, delayMs));
delayMs *= 2;
}
}
throw new Error("Unreachable");
}
- •Combine retry logic with the limiter in production code.
This keeps your application stable under load and gives you a single place to tune throughput. If you later move to Redis-backed distributed throttling, this wrapper remains the right integration point.
async function safeInvoke(prompt: string) {
return retry(() =>
limit(async () => {
await waitForSlot();
return model.invoke([new HumanMessage(prompt)]);
})
);
}
async function runBatch() {
const prompts = Array.from({ length: 6 }, (_, i) => `Summarize policy ${i + 1} in one line.`);
const outputs = await Promise.all(prompts.map((p) => safeInvoke(p)));
outputs.forEach((msg, idx) => {
console.log(`${idx + 1}: ${msg.content}`);
});
}
runBatch().catch(console.error);
Testing It
Run the script with npx tsx your-file.ts and watch the timestamps between requests stay spaced out instead of firing all at once. If you temporarily lower minIntervalMs to something aggressive like 50, you should see faster throughput and a higher chance of provider throttling. If you increase concurrency in pLimit(2) to pLimit(10), you’ll notice why concurrency control matters even when each request is individually retried. In production, log request start/end times and HTTP status codes so you can confirm the limiter is doing real work rather than just adding latency.
Next Steps
- •Move the limiter into a shared service layer so every chain, tool, and agent route uses the same policy.
- •Replace in-memory throttling with Redis if you run multiple Node instances.
- •Add token-based budgeting on top of request-based limiting for stricter cost control.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit