How to Fix 'rate limit exceeded in production' in LangGraph (TypeScript)

By Cyprian AaronsUpdated 2026-04-21

rate-limit-exceeded-in-productionlanggraphtypescript

When you see rate limit exceeded in production in a LangGraph TypeScript app, it usually means your graph is calling an upstream model or API too aggressively. The failure often shows up under load, when multiple graph runs fan out at once, or when a loop/retry path keeps re-invoking the same node.

In practice, this is rarely a LangGraph bug. It’s usually a concurrency, retry, or fan-out problem in your graph design.

The Most Common Cause

The #1 cause is uncontrolled parallelism inside the graph. In LangGraph, it’s easy to create a node that fans out to multiple branches or gets re-run in a loop, and each branch hits the LLM at the same time.

Here’s the broken pattern:

Broken	Fixed
Calls the model from every branch with no throttling	Serializes or limits concurrency
Retries at the node level without backoff	Uses bounded retries with delay
Lets loops re-enter the same expensive node	Caches or gates repeated calls

// BROKEN: every branch can hit the model at once
import { StateGraph, START, END } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

type State = {
  input: string;
  branches?: string[];
  output?: string[];
};

async function analyzeBranch(state: State) {
  const res = await llm.invoke(`Analyze: ${state.input}`);
  return { output: [res.content as string] };
}

const graph = new StateGraph<State>()
  .addNode("analyze", analyzeBranch)
  .addEdge(START, "analyze")
  .addEdge("analyze", END);

// FIXED: gate model calls and avoid unbounded fan-out
import pLimit from "p-limit";
import { StateGraph, START, END } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  apiKey: process.env.OPENAI_API_KEY,
});

const limit = pLimit(2);

type State = {
  input: string;
  output?: string[];
};

async function analyzeBranch(state: State) {
  return limit(async () => {
    const res = await llm.invoke(`Analyze: ${state.input}`);
    return { output: [res.content as string] };
  });
}

If you’re seeing errors like:

•Error: Rate limit exceeded
•429 Too Many Requests
•openai.RateLimitError
•RetryError: Failed after X attempts

then your graph is likely producing more requests than your provider quota allows.

Other Possible Causes

1. Retry settings are too aggressive

LangChain/LangGraph workflows often inherit retry behavior from the model client or surrounding code. If you retry immediately on every failure, you amplify traffic during an outage.

// Too aggressive
const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  maxRetries: 10,
});

Use fewer retries and exponential backoff if your client supports it.

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  maxRetries: 3,
});

2. Multiple graph runs are sharing one hot path

This happens when your API endpoint starts a full graph execution per request and traffic spikes. Even if each run is valid, total throughput can exceed provider limits.

// Example endpoint pattern that can overload the model
app.post("/chat", async (req, res) => {
  const result = await appStateGraph.invoke({ input: req.body.message });
  res.json(result);
});

Fix it with request queueing or a worker pool.

import PQueue from "p-queue";

const queue = new PQueue({ concurrency: 2 });

app.post("/chat", async (req, res) => {
  const result = await queue.add(() =>
    appStateGraph.invoke({ input: req.body.message })
  );
  res.json(result);
});

3. A loop node keeps calling the same tool/model repeatedly

In LangGraph, conditional edges can accidentally create tight loops. If your stop condition is wrong, one user request becomes dozens of model calls.

// Bad stop condition can keep looping forever
builder.addConditionalEdges("research", (state) =>
  state.needsMore ? "research" : END
);

Add explicit iteration caps.

if ((state.iterations ?? 0) >= 3) return END;
return state.needsMore ? "research" : END;

4. Batch jobs are not rate-limited per provider key

If you run background jobs in parallel using the same API key, all workers share the same quota. That’s common in production cron jobs and queue consumers.

// Dangerous in workers if many jobs run together
await Promise.all(jobs.map((job) => appStateGraph.invoke(job)));

Prefer bounded concurrency:

import pLimit from "p-limit";

const limit = pLimit(3);
await Promise.all(jobs.map((job) => limit(() => appStateGraph.invoke(job))));

How to Debug It

•
Check whether the error is from the provider or your orchestration
- •Look for 429, RateLimitError, or provider-specific messages.
- •If you see openai.RateLimitError, this is upstream throttling, not a LangGraph runtime failure.
•
Log node-level call counts
- •Add counters around each LLM/tool node.
- •If one user request triggers more calls than expected, you have fan-out or loop amplification.
•
Inspect concurrency at the process level
- •Count how many .invoke() calls are active at once.
- •If spikes line up with traffic bursts, introduce p-limit, queues, or worker throttling.
•
Review retries and conditional edges
- •Search for maxRetries, custom retry wrappers, and looping edges.
- •A bad retry policy plus a loop is how you turn one failed request into ten failed requests.

Prevention

•Put every expensive node behind bounded concurrency.
•Add hard iteration caps on loops and agent reflection steps.
•Centralize rate limiting per provider key, not per request handler.
•Treat retries as load multipliers; keep them small and back off properly.

If you want a stable production setup, design LangGraph like a distributed system component, not like a local script. The error message is just telling you that your graph is sending more traffic than your upstream can absorb.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit