LangChain Tutorial (Python): implementing retry logic for advanced developers
This tutorial shows you how to add retry logic around LangChain calls in Python so transient failures do not break your pipeline. You need this when you are calling flaky LLM APIs, rate-limited endpoints, or tools that fail intermittently and you want controlled retries instead of random request failures.
What You'll Need
- •Python 3.10+
- •
langchain - •
langchain-openai - •
tenacity - •An OpenAI API key set as
OPENAI_API_KEY - •Basic familiarity with LangChain
Runnableobjects and prompt templates
Install the packages:
pip install langchain langchain-openai tenacity
Step-by-Step
- •Start with a normal LangChain chain.
You want a clean baseline before adding retries. This example uses a prompt template, an OpenAI chat model, and an output parser.
import os
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY", "")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a concise assistant."),
("user", "Summarize this in one sentence: {text}")
])
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
parser = StrOutputParser()
chain = prompt | llm | parser
result = chain.invoke({"text": "LangChain lets developers compose LLM workflows from reusable components."})
print(result)
- •Add a retry wrapper with Tenacity.
LangChain has built-in retry helpers on some runnables, but Tenacity gives you full control over what gets retried, how many times, and how long to wait. Wrap the chain invocation instead of wrapping the model internals unless you have a very specific reason.
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from openai import RateLimitError, APIConnectionError
@retry(
stop=stop_after_attempt(4),
wait=wait_exponential(multiplier=1, min=1, max=8),
retry=retry_if_exception_type((RateLimitError, APIConnectionError)),
reraise=True,
)
def invoke_with_retry(inputs: dict) -> str:
return chain.invoke(inputs)
text = "Retry logic should only handle transient failures."
print(invoke_with_retry({"text": text}))
- •Make the retry policy more selective.
Do not blindly retry everything. Validation errors, bad prompts, and schema issues should fail immediately because retries will just burn tokens and time.
from tenacity import retry_if_exception
def is_retryable(exc: Exception) -> bool:
message = str(exc).lower()
transient_markers = [
"rate limit",
"timeout",
"temporarily unavailable",
"connection",
"server error",
]
return any(marker in message for marker in transient_markers)
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=0.5, min=1, max=6),
retry=retry_if_exception(is_retryable),
reraise=True,
)
def safe_invoke(inputs: dict) -> str:
return chain.invoke(inputs)
print(safe_invoke({"text": "Only retry transient failures."}))
- •Use retries with structured outputs too.
The same pattern works when your chain returns parsed data. This is common in production when you need JSON-like output for downstream systems.
from pydantic import BaseModel, Field
class Summary(BaseModel):
summary: str = Field(...)
structured_llm = llm.with_structured_output(Summary)
structured_chain = prompt | structured_llm
@retry(
stop=stop_after_attempt(4),
wait=wait_exponential(multiplier=1, min=1, max=10),
retry=retry_if_exception_type(Exception),
reraise=True,
)
def invoke_structured(inputs: dict) -> Summary:
return structured_chain.invoke(inputs)
output = invoke_structured({"text": "Structured outputs help downstream systems."})
print(output.model_dump())
- •Add observability around retries.
In production you need to know whether retries are hiding a real outage or just smoothing over occasional failures. Log the attempt count and exception type so you can tune your policy later.
import logging
from tenacity import before_sleep_log
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("retry")
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=1, max=5),
retry=retry_if_exception(is_retryable),
before_sleep=before_sleep_log(logger, logging.INFO),
reraise=True,
)
def monitored_invoke(inputs: dict) -> str:
return chain.invoke(inputs)
print(monitored_invoke({"text": "Log retries before sleeping."}))
Testing It
Run the script against a normal prompt first to confirm the happy path works. Then temporarily force a failure by disconnecting network access or using an invalid model name to see whether your code retries only when it should.
If you want to test the retry behavior without depending on real API failures, raise a fake exception inside a small wrapper function and confirm Tenacity performs multiple attempts before failing. Watch your logs for backoff timing and make sure non-retryable errors still surface immediately.
A good production check is to compare token usage and latency before and after adding retries. If latency spikes too much under load, reduce the max attempts or narrow the exception filter.
Next Steps
- •Add circuit breaker logic so repeated failures stop hammering the provider
- •Combine retries with async chains using
ainvokeand async-safe backoff - •Add idempotency keys for tool calls that mutate external systems
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit