LangChain Tutorial (Python): implementing retry logic for beginners
This tutorial shows you how to add retry logic to a LangChain Python app so transient failures from an LLM call don’t break your workflow. You need this when you’re calling APIs that occasionally fail with rate limits, timeouts, or 5xx errors and you want your app to recover automatically.
What You'll Need
- •Python 3.10+
- •
langchain - •
langchain-openai - •An OpenAI API key
- •A
.envfile or another way to set environment variables - •Basic familiarity with LangChain
PromptTemplateandLLMChain
Install the packages first:
pip install langchain langchain-openai python-dotenv
Set your API key:
export OPENAI_API_KEY="your-api-key-here"
Step-by-Step
- •Start with a minimal chain that can fail.
The point here is not to build the final solution yet. We want a normal LangChain chain first so you can see exactly where retry logic fits.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain.chains import LLMChain
prompt = PromptTemplate.from_template(
"Write a one-sentence summary of this text:\n\n{text}"
)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = LLMChain(llm=llm, prompt=prompt)
result = chain.invoke({"text": "LangChain helps developers build LLM applications."})
print(result["text"])
- •Wrap the model call with built-in retry settings.
LangChain’s OpenAI chat model supports retries through the underlying client configuration. This is the simplest production-friendly option because it retries failed requests without changing your chain logic.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
prompt = PromptTemplate.from_template(
"Write a one-sentence summary of this text:\n\n{text}"
)
llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0,
max_retries=3,
)
chain = prompt | llm
response = chain.invoke({"text": "LangChain helps developers build LLM applications."})
print(response.content)
- •Add retry logic around the whole chain for broader failures.
Sometimes the failure is not just the model call. Your prompt formatting, parsing, or downstream logic may also fail, so wrapping the full invocation gives you a wider safety net.
import time
from typing import Callable, TypeVar
T = TypeVar("T")
def retry(fn: Callable[[], T], attempts: int = 3, delay: float = 1.0) -> T:
last_error = None
for attempt in range(1, attempts + 1):
try:
return fn()
except Exception as e:
last_error = e
if attempt < attempts:
time.sleep(delay * attempt)
raise last_error
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
prompt = PromptTemplate.from_template("Summarize this in one sentence:\n\n{text}")
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt | llm
result = retry(lambda: chain.invoke({"text": "Retry logic prevents temporary failures."}))
print(result.content)
- •Use exponential backoff instead of fixed delays.
Fixed delays are fine for demos, but backoff is better when you want to reduce pressure on a failing API. This pattern waits longer after each failed attempt.
import time
from typing import Callable, TypeVar
T = TypeVar("T")
def retry_with_backoff(fn: Callable[[], T], attempts: int = 4, base_delay: float = 1.0) -> T:
last_error = None
for attempt in range(attempts):
try:
return fn()
except Exception as e:
last_error = e
if attempt < attempts - 1:
sleep_for = base_delay * (2 ** attempt)
time.sleep(sleep_for)
raise last_error
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
prompt = PromptTemplate.from_template("Give me a short answer: {question}")
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt | llm
answer = retry_with_backoff(
lambda: chain.invoke({"question": "What does retry logic do?"})
)
print(answer.content)
- •Retry only on transient errors, not every exception.
In production, you should not retry validation bugs or bad inputs forever. Catch the exceptions you expect from API calls and let everything else fail fast.
import time
from typing import Callable, TypeVar
from openai import RateLimitError, APITimeoutError
T = TypeVar("T")
def retry_transient(fn: Callable[[], T], attempts: int = 3) -> T:
for attempt in range(1, attempts + 1):
try:
return fn()
except (RateLimitError, APITimeoutError) as e:
if attempt == attempts:
raise
time.sleep(attempt)
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
prompt = PromptTemplate.from_template("Answer briefly: {topic}")
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt | llm
output = retry_transient(lambda: chain.invoke({"topic": "retry logic"}))
print(output.content)
Testing It
Run the script normally first and confirm you get a valid response back from the model. Then simulate failure by temporarily disconnecting your network or lowering your rate limit threshold if your account setup allows it.
If you use custom retry wrappers, add a deliberate exception inside the lambda once to make sure the second attempt succeeds. Also check that non-transient errors still bubble up instead of being retried forever.
For real validation, log each attempt count and delay so you can confirm backoff behavior under load. That matters more than just seeing one successful response at the end.
Next Steps
- •Add structured logging around each retry attempt and error type.
- •Move from simple retries to circuit breakers for repeated provider outages.
- •Combine retries with fallback models so your app can switch providers when one fails.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit