LangGraph Tutorial (TypeScript): rate limiting API calls for intermediate developers

By Cyprian AaronsUpdated 2026-04-22

langgraphrate-limiting-api-calls-for-intermediate-developerstypescript

This tutorial shows how to build a LangGraph workflow in TypeScript that rate limits outbound API calls with a token-bucket style limiter. You need this when your agent fans out to multiple tools or model calls and you want to avoid 429s, protect third-party quotas, and keep your app predictable under load.

What You'll Need

•Node.js 18+ installed
•A TypeScript project with ts-node or tsx
•
Packages:
- •@langchain/langgraph
- •@langchain/core
- •p-limit
- •zod
•An API key for the service you want to call
•Basic familiarity with LangGraph nodes, edges, and state

Step-by-Step

•Start by defining a small graph state that tracks pending work, completed results, and any rate-limit metadata you want to carry through the graph. Keep the state explicit so each node stays deterministic.

import { z } from "zod";

export const GraphStateSchema = z.object({
  items: z.array(z.string()),
  results: z.array(z.string()),
});

export type GraphState = z.infer<typeof GraphStateSchema>;

•Add a rate-limited API helper using p-limit. This keeps concurrency under control even if your graph schedules several tool calls at once.

import pLimit from "p-limit";

const limit = pLimit(2);

export async function rateLimitedFetch(prompt: string): Promise<string> {
  return limit(async () => {
    const response = await fetch("https://api.example.com/v1/process", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${process.env.API_KEY}`,
      },
      body: JSON.stringify({ prompt }),
    });

    if (!response.ok) {
      throw new Error(`API request failed: ${response.status} ${response.statusText}`);
    }

    const data = (await response.json()) as { result: string };
    return data.result;
  });
}

•Create a node that processes items one at a time and appends results back into state. This is the important part: the graph can still loop over many inputs, but only the wrapped call executes under the limiter.

import { Annotation, END, StateGraph } from "@langchain/langgraph";
import { rateLimitedFetch } from "./rateLimitedFetch";

const State = Annotation.Root({
  items: Annotation<string[]>({
    default: () => [],
    reducer: (_prev, next) => next,
  }),
  results: Annotation<string[]>({
    default: () => [],
    reducer: (prev, next) => prev.concat(next),
  }),
});

async function processNext(state: typeof State.State) {
  const [current, ...rest] = state.items;
  if (!current) return { items: [], results: [] };

  const result = await rateLimitedFetch(current);
  return {
    items: rest,
    results: [result],
  };
}

•Wire the graph so it keeps calling the processor until there are no items left. Use a conditional edge to stop cleanly when the queue is empty.

function shouldContinue(state: typeof State.State) {
  return state.items.length > 0 ? "process" : END;
}

const graph = new StateGraph(State)
  .addNode("process", processNext)
  .addEdge("__start__", "process")
  .addConditionalEdges("process", shouldContinue)
  .compile();

•Run the graph with a batch of inputs and inspect the final output. In production, this is where you would add retries around transient failures and log any rate-limit exceptions separately.

async function main() {
  const initialState = {
    items: ["account summary", "fraud check", "policy lookup", "claim status"],
    results: [],
  };

  const finalState = await graph.invoke(initialState);
  console.log(finalState.results);
}

main().catch((error) => {
  console.error(error);
  process.exit(1);
});

Testing It

Run the script against a sandbox or mock API first so you can see how many requests are in flight at once. If you set pLimit(2), you should never observe more than two concurrent outbound calls from this graph path.

To verify correctness, log timestamps before and after rateLimitedFetch and confirm requests are serialized according to your concurrency setting. Also test failure cases by forcing one request to return a non-200 response; the graph should surface the error instead of silently continuing.

If you need stricter protection than concurrency limiting, swap p-limit for a token-bucket implementation with per-minute quotas. That gives you better control when an upstream provider enforces request-rate windows rather than just parallelism.

Next Steps

•Add retry logic with exponential backoff for 429 and 503 responses.
•Move from concurrency limiting to true quota-based throttling with a shared Redis-backed limiter.
•Split your workflow into separate nodes for fetch, validate, and persist so only the outbound node is rate limited.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit