LangChain Tutorial (Python): mocking LLM calls in tests for advanced developers
This tutorial shows you how to make LangChain tests deterministic by mocking LLM calls in Python. You need this when your chain or agent logic is solid, but live model calls make tests slow, flaky, expensive, and hard to run in CI.
What You'll Need
- •Python 3.10+
- •
langchain - •
langchain-openai - •
pytest - •
pytest-mock - •Optional:
openaiif you want to run a real model outside tests - •An OpenAI API key only if you plan to execute live calls
- •A test project with a standard layout like
app/andtests/
Install the packages:
pip install langchain langchain-openai pytest pytest-mock
Step-by-Step
- •Start with a small LangChain function that uses a chat model. Keep the production code clean; the test will replace the network call without changing this file.
# app/summarizer.py
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
def summarize_ticket(ticket_text: str) -> str:
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
response = llm.invoke(
[HumanMessage(content=f"Summarize this support ticket in one sentence:\n{ticket_text}")]
)
return response.content
- •Test the function by mocking the model’s
invoke()method. This is the most direct approach when your code constructs the LLM inside the function and you want to keep the test focused on behavior.
# tests/test_summarizer.py
from unittest.mock import patch
from langchain_core.messages import AIMessage
from app.summarizer import summarize_ticket
def test_summarize_ticket_returns_llm_output():
fake_response = AIMessage(content="Customer cannot log in because MFA is failing.")
with patch("app.summarizer.ChatOpenAI.invoke", return_value=fake_response) as mock_invoke:
result = summarize_ticket("User reports login failure after password reset.")
assert result == "Customer cannot log in because MFA is failing."
mock_invoke.assert_called_once()
- •If your code uses an injected model, mock at the boundary instead of patching internals. This pattern scales better for larger systems because you can pass a fake LLM into services, workflows, or agents.
# app/analyzer.py
from langchain_core.messages import HumanMessage
def classify_issue(llm, issue_text: str) -> str:
response = llm.invoke([HumanMessage(content=f"Classify this issue: {issue_text}")])
return response.content
# tests/test_analyzer.py
from unittest.mock import Mock
from langchain_core.messages import AIMessage
from app.analyzer import classify_issue
def test_classify_issue_with_mock_llm():
llm = Mock()
llm.invoke.return_value = AIMessage(content="billing")
result = classify_issue(llm, "The invoice was charged twice.")
assert result == "billing"
llm.invoke.assert_called_once()
- •For more advanced chains, mock the runnable itself and assert on structured outputs. This is useful when your code composes prompts, parsers, and models using LangChain primitives.
# app/pipeline.py
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
("system", "You are a strict JSON generator."),
("human", "Extract the severity from: {text}")
])
def build_chain(llm):
return prompt | llm
def extract_severity(llm, text: str) -> str:
chain = build_chain(llm)
response = chain.invoke({"text": text})
return response.content
# tests/test_pipeline.py
from unittest.mock import Mock
from langchain_core.messages import AIMessage
from app.pipeline import extract_severity
def test_extract_severity_with_runnable_chain():
llm = Mock()
llm.invoke.return_value = AIMessage(content='{"severity":"high"}')
result = extract_severity(llm, "Payments are failing for all users.")
assert result == '{"severity":"high"}'
- •If you want to avoid patching
invoke()repeatedly, create a reusable fake LLM for your test suite. This keeps fixtures readable and gives you predictable responses across multiple tests.
# tests/conftest.py
import pytest
from unittest.mock import Mock
from langchain_core.messages import AIMessage
@pytest.fixture
def fake_llm():
llm = Mock()
def _invoke(messages):
last_message = messages[-1].content.lower()
if "billing" in last_message:
return AIMessage(content="billing")
if "login" in last_message:
return AIMessage(content="auth")
return AIMessage(content="general")
llm.invoke.side_effect = _invoke
return llm
Testing It
Run your suite with pytest -q. The important check is that no test hits the network and all outputs stay stable across repeated runs.
If a test starts failing only when OpenAI latency changes or credentials expire, you are not mocking correctly. In that case, move the boundary higher up and patch either ChatOpenAI.invoke, your service-level dependency, or a fixture-backed fake.
A good sanity check is to temporarily disconnect from the network and rerun the tests. If they still pass, your mocks are doing their job.
Next Steps
- •Add snapshot tests for prompt templates so accidental prompt drift gets caught early.
- •Learn how to mock tool-calling agents by faking tool outputs and message history.
- •Use dependency injection for every model-boundary service so you can swap real and fake LLMs cleanly in production and tests.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit