LangChain Tutorial (Python): implementing retry logic for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
langchainimplementing-retry-logic-for-intermediate-developerspython

This tutorial shows how to add retry logic around LangChain calls in Python so transient failures do not break your pipeline. You need this when calling LLMs, tools, or retrievers that can fail due to rate limits, timeouts, or flaky upstream APIs.

What You'll Need

  • Python 3.10+
  • langchain
  • langchain-openai
  • openai API key
  • Optional: python-dotenv if you want to load secrets from a .env file
  • A basic LangChain setup with an OpenAI-compatible chat model

Install the packages:

pip install langchain langchain-openai openai python-dotenv

Set your API key in the environment:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

  1. Start with a plain LangChain chain. This gives you a baseline before adding retries, and it keeps the example close to what you already have in production.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise assistant."),
    ("user", "Explain retry logic in one sentence.")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

chain = prompt | llm

response = chain.invoke({})
print(response.content)
  1. Wrap the model call with retry behavior using with_retry(). LangChain exposes this directly on runnable objects, which is the cleanest way to retry only the failing step instead of rebuilding your own loop.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise assistant."),
    ("user", "Give me a short answer about retries.")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

retrying_llm = llm.with_retry(
    stop_after_attempt=3,
    wait_exponential_jitter=True,
)

chain = prompt | retrying_llm

result = chain.invoke({})
print(result.content)
  1. Retry the full chain when prompt formatting or downstream steps can also fail. This is useful if your chain includes parsing, tool calls, or custom runnables that may throw exceptions outside the model client.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a careful assistant."),
    ("user", "Summarize why retries matter for APIs.")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

chain = (prompt | llm).with_retry(
    stop_after_attempt=3,
    wait_exponential_jitter=True,
)

response = chain.invoke({})
print(response.content)
  1. Add selective retries so you do not keep retrying on errors that will never succeed. In production, you usually want retries for timeouts and rate limits, but not for bad prompts or validation errors.
from openai import RateLimitError, APITimeoutError
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a careful assistant."),
    ("user", "Return one sentence about failure handling.")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

retrying_chain = (prompt | llm).with_retry(
    stop_after_attempt=4,
    wait_exponential_jitter=True,
    retry_if_exception_type=(RateLimitError, APITimeoutError),
)

output = retrying_chain.invoke({})
print(output.content)
  1. Add logging around failures so you can see when retries happen and when they finally give up. If you are running this in a service, pair this with structured logs and metrics so repeated failures show up quickly.
import logging
from openai import RateLimitError, APITimeoutError
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("retry-demo")

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a careful assistant."),
    ("user", "Explain what exponential backoff is.")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = (prompt | llm).with_retry(
    stop_after_attempt=3,
    wait_exponential_jitter=True,
    retry_if_exception_type=(RateLimitError, APITimeoutError),
)

try:
    result = chain.invoke({})
    logger.info("Success: %s", result.content)
except Exception as e:
    logger.exception("Final failure after retries: %s", e)

Testing It

Run the script once with a valid API key and confirm you get a normal response. Then simulate a failure by temporarily using an invalid model name or disconnecting network access; you should see the call fail after the configured number of attempts rather than immediately on the first error.

If you want to test rate-limit behavior specifically, lower your provider quota or use a mock that raises RateLimitError. In production tests, assert both outcomes: successful completion after transient errors and final failure after max attempts.

A good sanity check is to compare logs before and after adding with_retry(). Without retries, failures should surface once; with retries enabled, you should see delayed failure and multiple attempts for eligible exceptions.

Next Steps

  • Add circuit breaker logic so repeated upstream failures stop hammering the API.
  • Combine retries with timeout controls and request-level observability.
  • Learn how to apply retries to tools and retrievers inside agent workflows, not just chat models.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides