How to Fix 'rate limit exceeded' in LangGraph (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

rate-limit-exceededlanggraphtypescript

Opening

rate limit exceeded in LangGraph usually means one of the underlying model calls hit the provider’s quota, not that LangGraph itself is broken. In TypeScript projects, this shows up most often when a graph node fires too many LLM calls in a loop, during retries, or when multiple requests share the same API key and burst past limits.

You’ll usually see this with OpenAI, Anthropic, or another chat model wrapped inside a ChatModel node. The stack trace often points at RunnableSequence, ChatOpenAI, ChatAnthropic, or a LangGraph node function that keeps re-entering.

The Most Common Cause

The #1 cause is an accidental retry loop or graph cycle that keeps calling the model without a stop condition. In LangGraph, this happens when your conditional edge keeps routing back to the same node, or your state never changes in a way that ends execution.

Here’s the broken pattern:

import { StateGraph, END } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({ model: "gpt-4o-mini", temperature: 0 });

type State = {
  messages: any[];
  done?: boolean;
};

const graph = new StateGraph<State>({
  channels: {
    messages: { value: (x: any[], y: any[]) => x.concat(y), default: () => [] },
    done: { value: (_x, y) => y, default: () => false },
  },
});

graph.addNode("agent", async (state) => {
  const response = await llm.invoke(state.messages);
  return {
    messages: [{ role: "assistant", content: response.content }],
    done: false,
  };
});

graph.addConditionalEdges("agent", (state) => {
  // Broken: always loops back to agent
  return "agent";
});

graph.setEntryPoint("agent");
const app = graph.compile();

And here’s the fixed version side by side:

Broken	Fixed
Always routes back to `"agent"`	Routes to `END` when work is done
Never updates state in a meaningful way	Updates state with a completion flag
Can spin until provider rate limits you	Stops after one successful pass

import { StateGraph, END } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({ model: "gpt-4o-mini", temperature: 0 });

type State = {
  messages: any[];
  done?: boolean;
};

const graph = new StateGraph<State>({
  channels: {
    messages: { value: (x: any[], y: any[]) => x.concat(y), default: () => [] },
    done: { value: (_x, y) => y, default: () => false },
  },
});

graph.addNode("agent", async (state) => {
  const response = await llm.invoke(state.messages);

  return {
    messages: [{ role: "assistant", content: response.content }],
    done:
      typeof response.content === "string" &&
      response.content.includes("FINAL"),
  };
});

graph.addConditionalEdges("agent", (state) => {
  return state.done ? END : END;
});

graph.setEntryPoint("agent");
const app = graph.compile();

The important part is not the exact FINAL check. It’s that your graph must have a real termination path and your node must mutate state in a way that makes termination possible.

Other Possible Causes

1. Multiple concurrent requests sharing one API key

If you run several graph executions at once, you can exceed per-minute request limits fast.

// Problematic under load
await Promise.all(
  inputs.map((input) => app.invoke({ messages: input }))
);

Use bounded concurrency instead:

import pLimit from "p-limit";

const limit = pLimit(2);

await Promise.all(
  inputs.map((input) =>
    limit(() => app.invoke({ messages: input }))
  )
);

2. A retry policy that retries rate limits too aggressively

LangChain wrappers may retry failed calls automatically. If your provider returns 429 Too Many Requests, repeated retries can make things worse.

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  maxRetries: 5,
});

For debugging, lower retries:

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  maxRetries: 1,
});

Then add your own backoff around the graph call if needed.

3. A node that calls the model more than once per execution

A single LangGraph node can still burn through quota if it does extra summarization, classification, and final generation in sequence.

graph.addNode("agent", async (state) => {
  const intent = await llm.invoke([{ role: "user", content: "Classify intent" }]);
  const summary = await llm.invoke([{ role: "user", content: "Summarize" }]);
  const answer = await llm.invoke(state.messages);

  return { messages: [{ role: "assistant", content: answer.content }] };
});

Split those into separate nodes only if they’re truly needed, and cache intermediate outputs where possible.

4. Provider-side quotas or project-level limits

Sometimes the code is fine and your OpenAI/Anthropic project has hit its quota.

# Example symptoms in logs:
429 rate_limit_exceeded
insufficient_quota
You exceeded your current quota

Check:

•monthly spend caps
•per-minute request limits
•project-scoped keys vs org-scoped keys

How to Debug It

•

Log every node entry and exit Add explicit logs around each node so you can see whether the same node is being called repeatedly.

graph.addNode("agent", async (state) => {
  console.log("agent:start", state.messages.length);
  const result = await llm.invoke(state.messages);
  console.log("agent:end");
  return { messages: [{ role: "assistant", content: result.content }] };
});

•
Inspect the exact error payload Look for provider-specific details like 429, rate_limit_exceeded, or insufficient_quota.

Common shapes:
- •ErrorResponse
- •APIError
- •RateLimitError
- •TooManyRequestsError
•
Temporarily disable retries If the error disappears or changes shape with fewer retries, you’re probably amplifying traffic through automatic retry behavior.
•
Run one request at a time Replace parallel invocations with a single sequential call.

If one request succeeds but batch traffic fails, you have a concurrency problem rather than a graph logic bug.

Prevention

•Put hard stop conditions on every cycle in your LangGraph state machine.
•Keep model calls inside nodes to one primary call unless there’s a strong reason otherwise.
•Add concurrency limits and exponential backoff before deploying batch workloads.
•Monitor provider usage separately from application logs so you can tell quota issues from graph bugs quickly.

If you’re seeing rate limit exceeded in LangGraph TypeScript, start by checking for loops first. In practice, that’s where most of these failures come from.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit