LangGraph Tutorial (TypeScript): implementing retry logic for advanced developers

By Cyprian AaronsUpdated 2026-04-22
langgraphimplementing-retry-logic-for-advanced-developerstypescript

This tutorial shows how to add retry logic to a LangGraph workflow in TypeScript without turning your graph into a mess of nested try/catch blocks. You need this when external calls fail intermittently, and you want controlled retries with backoff, failure tracking, and a clean path to a fallback response.

What You'll Need

  • Node.js 18+
  • TypeScript 5+
  • @langchain/langgraph
  • @langchain/openai
  • zod
  • An OpenAI API key in OPENAI_API_KEY
  • A project configured for ESM or TypeScript compilation

Install the packages:

npm install @langchain/langgraph @langchain/openai zod
npm install -D typescript tsx @types/node

Step-by-Step

  1. Start with a state that tracks retries explicitly.
    The important part is not the model call itself, but the metadata around it: attempt count, last error, and whether the workflow should continue.
import { Annotation, END, START, StateGraph } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
import { z } from "zod";

const State = Annotation.Root({
  input: Annotation<string>(),
  output: Annotation<string | null>(),
  attempt: Annotation<number>({ default: () => 0 }),
  error: Annotation<string | null>({ default: () => null }),
});

type GraphState = typeof State.State;

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0,
});
  1. Add a node that fails sometimes and records the failure.
    In production, this would be your API call, tool execution, or structured output parser. Here we simulate a flaky step so you can see the retry behavior clearly.
async function flakyNode(state: GraphState): Promise<Partial<GraphState>> {
  const nextAttempt = state.attempt + 1;

  if (nextAttempt < 3) {
    throw new Error(`Transient failure on attempt ${nextAttempt}`);
  }

  const response = await llm.invoke([
    { role: "system", content: "Answer concisely." },
    { role: "user", content: state.input },
  ]);

  return {
    output: String(response.content),
    attempt: nextAttempt,
    error: null,
  };
}
  1. Wrap the risky node with retry handling in a separate node.
    This keeps the graph logic deterministic. The retry node catches errors, increments attempt count, and decides whether to try again or stop.
async function retryWrapper(state: GraphState): Promise<Partial<GraphState>> {
  const maxRetries = 3;

  try {
    return await flakyNode(state);
  } catch (err) {
    const message = err instanceof Error ? err.message : String(err);
    const nextAttempt = state.attempt + 1;

    if (nextAttempt >= maxRetries) {
      return {
        attempt: nextAttempt,
        error: message,
        output:
          "Request failed after retries. Please try again later or contact support.",
      };
    }

    return {
      attempt: nextAttempt,
      error: message,
      output: null,
    };
  }
}
  1. Route the graph based on whether another retry is needed.
    This is where LangGraph earns its keep. Instead of looping in application code, you let the graph decide whether to terminate or re-enter the same work node.
function shouldRetry(state: GraphState): "retry" | "end" {
  if (state.output !== null) return "end";
  if (state.attempt >= 3) return "end";
  return "retry";
}

const graph = new StateGraph(State)
  .addNode("work", retryWrapper)
  .addConditionalEdges("work", shouldRetry, {
    retry: "work",
    end: END,
  })
  .addEdge(START, "work")
  .compile();
  1. Run the graph and inspect the final state.
    The output tells you whether the workflow succeeded on a later attempt or exhausted its retry budget. That makes it easy to plug into logs, metrics, or downstream fallback handling.
async function main() {
  const result = await graph.invoke({
    input: "Explain why retry logic matters in distributed systems.",
    output: null,
    attempt: 0,
    error: null,
  });

  console.log("Final state:", result);
}

main().catch(console.error);
  1. If you want production-grade behavior, add exponential backoff inside the wrapper.
    A fixed immediate retry is fine for demos, but real systems should wait between attempts to avoid hammering a failing dependency.
function sleep(ms: number) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

async function retryWrapperWithBackoff(
  state: GraphState
): Promise<Partial<GraphState>> {
  const maxRetries = 3;
  const delayMs = Math.pow(2, state.attempt) * 250;

  try {
    await sleep(delayMs);
    return await flakyNode(state);
  } catch (err) {
    const message = err instanceof Error ? err.message : String(err);
    const nextAttempt = state.attempt + 1;

    return nextAttempt >= maxRetries
      ? { attempt: nextAttempt, error: message, output: "Fallback response." }
      : { attempt: nextAttempt, error: message, output: null };
  }
}

Testing It

Run the file with tsx and confirm that the first two attempts fail before the third succeeds. You should see attempt incrementing in the final state and error cleared once the call succeeds.

Then swap flakyNode for a real tool call or OpenAI request that occasionally fails due to rate limits or network issues. If your graph returns a fallback after three failures, your control flow is working correctly.

To verify backoff timing, log timestamps before each attempt and confirm that delays increase between retries. That matters when you are protecting upstream services from bursty failure loops.

Next Steps

  • Add per-error-class handling so rate limits retry longer than validation errors
  • Move retry policy into config so different nodes can have different budgets
  • Combine this with structured outputs and schema validation failures for stricter agent workflows

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides