LangChain Tutorial (Python): mocking LLM calls in tests for intermediate developers
This tutorial shows you how to write deterministic tests for LangChain code by mocking LLM calls in Python. You need this when your chain or agent logic is correct, but hitting a real model makes tests slow, flaky, expensive, or dependent on network access.
What You'll Need
- •Python 3.10+
- •
langchain - •
langchain-openai - •
pytest - •
responsesorunittest.mockif you want to mock lower-level HTTP calls - •An OpenAI API key only if you plan to run the real chain outside tests
- •A basic LangChain setup with
ChatOpenAIand a simple chain
Install the packages:
pip install langchain langchain-openai pytest
Step-by-Step
- •Start with a small chain that you can test in isolation. Keep the prompt and output parser simple so the test focuses on your application logic, not prompt engineering.
# app.py
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a concise assistant."),
("human", "Summarize this text: {text}"),
]
)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt | llm
def summarize(text: str) -> str:
response = chain.invoke({"text": text})
return response.content
- •In tests, patch the model call at the boundary where LangChain asks the LLM for output. For most application code, mocking
invoke()is enough because your goal is to verify your business logic, not OpenAI’s behavior.
# test_app.py
from unittest.mock import patch
from langchain_core.messages import AIMessage
from app import summarize
def test_summarize_returns_mocked_answer():
with patch("app.chain.invoke") as mock_invoke:
mock_invoke.return_value = AIMessage(content="Mock summary")
result = summarize("LangChain helps build LLM apps.")
assert result == "Mock summary"
mock_invoke.assert_called_once()
- •If your code constructs the model inside the function, patch
ChatOpenAI.invokeinstead of a local chain object. This pattern is useful when you want to keep production code clean and avoid injecting test doubles everywhere.
# app2.py
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
def summarize_inline(text: str) -> str:
prompt = ChatPromptTemplate.from_messages(
[("human", "Summarize this text: {text}")]
)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt | llm
response = chain.invoke({"text": text})
return response.content
# test_app2.py
from unittest.mock import patch
from langchain_core.messages import AIMessage
from app2 import summarize_inline
def test_summarize_inline_mocks_llm():
with patch("langchain_openai.chat_models.base.BaseChatOpenAI.invoke") as mock_invoke:
mock_invoke.return_value = AIMessage(content="Inline mock")
result = summarize_inline("Test input")
assert result == "Inline mock"
- •If you want stronger coverage, assert that your prompt inputs are correct before the mocked LLM returns anything. This catches bugs where your code passes the wrong variables into the chain and still gets a “successful” mocked response.
# test_prompt_inputs.py
from unittest.mock import patch
from langchain_core.messages import AIMessage
from app import summarize
def test_summarize_passes_expected_input():
with patch("app.chain.invoke") as mock_invoke:
mock_invoke.return_value = AIMessage(content="OK")
summarize("Hello world")
mock_invoke.assert_called_once_with({"text": "Hello world"})
- •For more realistic integration-style tests without calling the network, use a fake chat model from LangChain instead of a manual mock. This is useful when you want predictable outputs while still exercising more of LangChain’s plumbing.
# fake_model_test.py
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.language_models.fake_chat_models import FakeListChatModel
prompt = ChatPromptTemplate.from_messages([("human", "{topic}")])
llm = FakeListChatModel(responses=["First fake answer"])
chain = prompt | llm
def test_fake_chat_model():
result = chain.invoke({"topic": "insurance claims"})
assert result.content == "First fake answer"
Testing It
Run your tests with pytest -q. You should see deterministic results every time, with no API traffic and no dependency on model latency.
If a test fails, check whether you patched the correct symbol. In Python, you patch where the object is used, not where it was originally defined.
A good sanity check is to temporarily change the mocked return value and confirm your assertions fail for the right reason. That tells you your test is actually exercising your code path instead of just passing by accident.
Next Steps
- •Add tests for structured outputs using Pydantic models and LangChain output parsers.
- •Learn how to use dependency injection so chains and models can be swapped cleanly in tests.
- •Mock tool calls separately from LLM calls if you’re testing agents or retrieval pipelines.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit