LangGraph Tutorial (TypeScript): rate limiting API calls for advanced developers

By Cyprian AaronsUpdated 2026-04-22

langgraphrate-limiting-api-calls-for-advanced-developerstypescript

This tutorial shows how to build a LangGraph workflow in TypeScript that rate limits outbound API calls before they hit your provider. You need this when your agent fans out across multiple tools, you want to avoid 429s, and you need predictable throughput under load.

What You'll Need

•Node.js 20+
•TypeScript 5+
•@langchain/langgraph
•@langchain/core
•zod
•An API key for any provider you want to call from the graph
•A project configured for ES modules
•Basic familiarity with LangGraph state graphs and async nodes

Install the packages:

npm install @langchain/langgraph @langchain/core zod
npm install -D typescript tsx @types/node

Step-by-Step

•Start with a shared rate limiter that controls request frequency across all graph nodes. A simple token bucket is enough for most agent workloads, and it works well because it limits burst traffic without requiring external infrastructure.

class TokenBucket {
  private tokens: number;
  private lastRefill: number;

  constructor(
    private readonly capacity: number,
    private readonly refillPerSecond: number
  ) {
    this.tokens = capacity;
    this.lastRefill = Date.now();
  }

  async acquire(): Promise<void> {
    while (true) {
      this.refill();
      if (this.tokens >= 1) {
        this.tokens -= 1;
        return;
      }
      await new Promise((r) => setTimeout(r, 100));
    }
  }

  private refill() {
    const now = Date.now();
    const elapsedSeconds = (now - this.lastRefill) / 1000;
    const refill = elapsedSeconds * this.refillPerSecond;
    if (refill >= 1) {
      this.tokens = Math.min(this.capacity, this.tokens + refill);
      this.lastRefill = now;
    }
  }
}

•Define your graph state and create a shared limiter instance outside the graph. Keeping the limiter in module scope means every node execution shares the same budget, which is what you want when multiple branches can call the same upstream API.

import { z } from "zod";
import { StateGraph, START, END } from "@langchain/langgraph";

const limiter = new TokenBucket(3, 1);

const GraphStateSchema = z.object({
  prompt: z.string(),
  result: z.string().optional(),
});

type GraphState = z.infer<typeof GraphStateSchema>;

•Add a node that waits for permission before making the API call. This example uses fetch so it runs as-written on Node 20+, and the limiter sits directly in front of the outbound request.

async function callApiNode(state: GraphState): Promise<Partial<GraphState>> {
  await limiter.acquire();

  const response = await fetch("https://api.example.com/v1/generate", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": `Bearer ${process.env.EXAMPLE_API_KEY ?? ""}`,
    },
    body: JSON.stringify({ prompt: state.prompt }),
  });

  if (!response.ok) {
    throw new Error(`API failed with ${response.status}`);
  }

  const data = (await response.json()) as { output?: string };
  return { result: data.output ?? "" };
}

•Wire the node into a LangGraph workflow. For a single-call pipeline, one node is enough, but the structure matters because you can extend it later with branching, retries, or tool execution.

const graph = new StateGraph(GraphStateSchema)
  .addNode("callApi", callApiNode)
  .addEdge(START, "callApi")
  .addEdge("callApi", END);

const app = graph.compile();

•Run multiple invocations to see the limiter in action. With a capacity of 3 and a refill rate of 1 token/sec, the first three requests should go through immediately, and later ones should queue instead of hammering your provider.

async function main() {
  const inputs: GraphState[] = [
    { prompt: "A" },
    { prompt: "B" },
    { prompt: "C" },
    { prompt: "D" },
    { prompt: "E" },
  ];

  const results = await Promise.all(
    inputs.map((input) => app.invoke(input))
  );

  console.log(results);
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

Testing It

Run the script locally and watch the timing between requests. The first few calls should complete quickly, then later calls should slow down as the bucket empties and refills.

To make verification easier, add timestamps around limiter.acquire() and compare them across concurrent invocations. If you see serialized waits instead of bursts beyond capacity, your limiter is doing its job.

Also test failure paths by forcing a non-200 response from your API endpoint. The graph should throw cleanly without bypassing the limiter logic or corrupting shared state.

Next Steps

•Add per-user or per-tenant buckets instead of one global bucket.
•Replace the in-memory limiter with Redis if you need rate limiting across multiple app instances.
•Combine this pattern with LangGraph retries so transient 429 responses back off before re-entering the node.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit