LangChain Tutorial (Python): mocking LLM calls in tests for advanced developers

By Cyprian AaronsUpdated 2026-04-21
langchainmocking-llm-calls-in-tests-for-advanced-developerspython

This tutorial shows you how to make LangChain tests deterministic by mocking LLM calls in Python. You need this when your chain or agent logic is solid, but live model calls make tests slow, flaky, expensive, and hard to run in CI.

What You'll Need

  • Python 3.10+
  • langchain
  • langchain-openai
  • pytest
  • pytest-mock
  • Optional: openai if you want to run a real model outside tests
  • An OpenAI API key only if you plan to execute live calls
  • A test project with a standard layout like app/ and tests/

Install the packages:

pip install langchain langchain-openai pytest pytest-mock

Step-by-Step

  1. Start with a small LangChain function that uses a chat model. Keep the production code clean; the test will replace the network call without changing this file.
# app/summarizer.py
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

def summarize_ticket(ticket_text: str) -> str:
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    response = llm.invoke(
        [HumanMessage(content=f"Summarize this support ticket in one sentence:\n{ticket_text}")]
    )
    return response.content
  1. Test the function by mocking the model’s invoke() method. This is the most direct approach when your code constructs the LLM inside the function and you want to keep the test focused on behavior.
# tests/test_summarizer.py
from unittest.mock import patch
from langchain_core.messages import AIMessage
from app.summarizer import summarize_ticket

def test_summarize_ticket_returns_llm_output():
    fake_response = AIMessage(content="Customer cannot log in because MFA is failing.")

    with patch("app.summarizer.ChatOpenAI.invoke", return_value=fake_response) as mock_invoke:
        result = summarize_ticket("User reports login failure after password reset.")

    assert result == "Customer cannot log in because MFA is failing."
    mock_invoke.assert_called_once()
  1. If your code uses an injected model, mock at the boundary instead of patching internals. This pattern scales better for larger systems because you can pass a fake LLM into services, workflows, or agents.
# app/analyzer.py
from langchain_core.messages import HumanMessage

def classify_issue(llm, issue_text: str) -> str:
    response = llm.invoke([HumanMessage(content=f"Classify this issue: {issue_text}")])
    return response.content
# tests/test_analyzer.py
from unittest.mock import Mock
from langchain_core.messages import AIMessage
from app.analyzer import classify_issue

def test_classify_issue_with_mock_llm():
    llm = Mock()
    llm.invoke.return_value = AIMessage(content="billing")

    result = classify_issue(llm, "The invoice was charged twice.")

    assert result == "billing"
    llm.invoke.assert_called_once()
  1. For more advanced chains, mock the runnable itself and assert on structured outputs. This is useful when your code composes prompts, parsers, and models using LangChain primitives.
# app/pipeline.py
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a strict JSON generator."),
    ("human", "Extract the severity from: {text}")
])

def build_chain(llm):
    return prompt | llm

def extract_severity(llm, text: str) -> str:
    chain = build_chain(llm)
    response = chain.invoke({"text": text})
    return response.content
# tests/test_pipeline.py
from unittest.mock import Mock
from langchain_core.messages import AIMessage
from app.pipeline import extract_severity

def test_extract_severity_with_runnable_chain():
    llm = Mock()
    llm.invoke.return_value = AIMessage(content='{"severity":"high"}')

    result = extract_severity(llm, "Payments are failing for all users.")

    assert result == '{"severity":"high"}'
  1. If you want to avoid patching invoke() repeatedly, create a reusable fake LLM for your test suite. This keeps fixtures readable and gives you predictable responses across multiple tests.
# tests/conftest.py
import pytest
from unittest.mock import Mock
from langchain_core.messages import AIMessage

@pytest.fixture
def fake_llm():
    llm = Mock()
    
    def _invoke(messages):
        last_message = messages[-1].content.lower()
        if "billing" in last_message:
            return AIMessage(content="billing")
        if "login" in last_message:
            return AIMessage(content="auth")
        return AIMessage(content="general")
    
    llm.invoke.side_effect = _invoke
    return llm

Testing It

Run your suite with pytest -q. The important check is that no test hits the network and all outputs stay stable across repeated runs.

If a test starts failing only when OpenAI latency changes or credentials expire, you are not mocking correctly. In that case, move the boundary higher up and patch either ChatOpenAI.invoke, your service-level dependency, or a fixture-backed fake.

A good sanity check is to temporarily disconnect from the network and rerun the tests. If they still pass, your mocks are doing their job.

Next Steps

  • Add snapshot tests for prompt templates so accidental prompt drift gets caught early.
  • Learn how to mock tool-calling agents by faking tool outputs and message history.
  • Use dependency injection for every model-boundary service so you can swap real and fake LLMs cleanly in production and tests.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides