LangChain Tutorial (Python): implementing retry logic for intermediate developers
This tutorial shows how to add retry logic around LangChain calls in Python so transient failures do not break your pipeline. You need this when calling LLMs, tools, or retrievers that can fail due to rate limits, timeouts, or flaky upstream APIs.
What You'll Need
- •Python 3.10+
- •
langchain - •
langchain-openai - •
openaiAPI key - •Optional:
python-dotenvif you want to load secrets from a.envfile - •A basic LangChain setup with an OpenAI-compatible chat model
Install the packages:
pip install langchain langchain-openai openai python-dotenv
Set your API key in the environment:
export OPENAI_API_KEY="your-key-here"
Step-by-Step
- •Start with a plain LangChain chain. This gives you a baseline before adding retries, and it keeps the example close to what you already have in production.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
("system", "You are a concise assistant."),
("user", "Explain retry logic in one sentence.")
])
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt | llm
response = chain.invoke({})
print(response.content)
- •Wrap the model call with retry behavior using
with_retry(). LangChain exposes this directly on runnable objects, which is the cleanest way to retry only the failing step instead of rebuilding your own loop.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
("system", "You are a concise assistant."),
("user", "Give me a short answer about retries.")
])
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
retrying_llm = llm.with_retry(
stop_after_attempt=3,
wait_exponential_jitter=True,
)
chain = prompt | retrying_llm
result = chain.invoke({})
print(result.content)
- •Retry the full chain when prompt formatting or downstream steps can also fail. This is useful if your chain includes parsing, tool calls, or custom runnables that may throw exceptions outside the model client.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
("system", "You are a careful assistant."),
("user", "Summarize why retries matter for APIs.")
])
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = (prompt | llm).with_retry(
stop_after_attempt=3,
wait_exponential_jitter=True,
)
response = chain.invoke({})
print(response.content)
- •Add selective retries so you do not keep retrying on errors that will never succeed. In production, you usually want retries for timeouts and rate limits, but not for bad prompts or validation errors.
from openai import RateLimitError, APITimeoutError
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
("system", "You are a careful assistant."),
("user", "Return one sentence about failure handling.")
])
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
retrying_chain = (prompt | llm).with_retry(
stop_after_attempt=4,
wait_exponential_jitter=True,
retry_if_exception_type=(RateLimitError, APITimeoutError),
)
output = retrying_chain.invoke({})
print(output.content)
- •Add logging around failures so you can see when retries happen and when they finally give up. If you are running this in a service, pair this with structured logs and metrics so repeated failures show up quickly.
import logging
from openai import RateLimitError, APITimeoutError
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("retry-demo")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a careful assistant."),
("user", "Explain what exponential backoff is.")
])
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = (prompt | llm).with_retry(
stop_after_attempt=3,
wait_exponential_jitter=True,
retry_if_exception_type=(RateLimitError, APITimeoutError),
)
try:
result = chain.invoke({})
logger.info("Success: %s", result.content)
except Exception as e:
logger.exception("Final failure after retries: %s", e)
Testing It
Run the script once with a valid API key and confirm you get a normal response. Then simulate a failure by temporarily using an invalid model name or disconnecting network access; you should see the call fail after the configured number of attempts rather than immediately on the first error.
If you want to test rate-limit behavior specifically, lower your provider quota or use a mock that raises RateLimitError. In production tests, assert both outcomes: successful completion after transient errors and final failure after max attempts.
A good sanity check is to compare logs before and after adding with_retry(). Without retries, failures should surface once; with retries enabled, you should see delayed failure and multiple attempts for eligible exceptions.
Next Steps
- •Add circuit breaker logic so repeated upstream failures stop hammering the API.
- •Combine retries with timeout controls and request-level observability.
- •Learn how to apply retries to tools and retrievers inside agent workflows, not just chat models.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit