LangChain Tutorial (Python): mocking LLM calls in tests for beginners

By Cyprian AaronsUpdated 2026-04-21
langchainmocking-llm-calls-in-tests-for-beginnerspython

This tutorial shows you how to write Python tests for LangChain code without calling a real LLM. You need this when you want fast, deterministic tests that do not burn API credits, fail on network issues, or change behavior because the model output drifted.

What You'll Need

  • Python 3.10+
  • langchain
  • langchain-openai
  • pytest
  • An OpenAI API key only if you want to run the real chain outside tests
  • A basic LangChain chain that accepts a prompt and returns text

Install the packages:

pip install langchain langchain-openai pytest

Step-by-Step

  1. Start with a small LangChain function that uses an LLM. Keep it simple so the test focuses on mocking the model call, not on framework noise.
# app.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise assistant."),
    ("user", "Summarize this text in one sentence: {text}")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt | llm

def summarize_text(text: str) -> str:
    response = chain.invoke({"text": text})
    return response.content
  1. Write your first test by mocking the invoke method on the chain. This is the cleanest beginner-friendly approach because it avoids patching internals and keeps the test focused on your function’s behavior.
# test_app.py
from unittest.mock import MagicMock, patch
from langchain_core.messages import AIMessage
from app import summarize_text

def test_summarize_text_returns_mocked_content():
    fake_response = AIMessage(content="This is a short summary.")

    with patch("app.chain.invoke", return_value=fake_response) as mock_invoke:
        result = summarize_text("LangChain helps build LLM apps.")

    assert result == "This is a short summary."
    mock_invoke.assert_called_once_with({"text": "LangChain helps build LLM apps."})
  1. If you want to test more than one output, use side_effect. This is useful when your function calls the chain multiple times or when you need different responses for different inputs.
# test_app.py
from unittest.mock import patch
from langchain_core.messages import AIMessage
from app import summarize_text

def test_summarize_text_multiple_inputs():
    responses = [
        AIMessage(content="First summary."),
        AIMessage(content="Second summary."),
    ]

    with patch("app.chain.invoke", side_effect=responses):
        first = summarize_text("Text one.")
        second = summarize_text("Text two.")

    assert first == "First summary."
    assert second == "Second summary."
  1. For code that builds chains inside a function, patch the model class instead of a global chain object. This pattern matters when your production code creates the LLM at runtime and you want to keep tests isolated from external calls.
# app_factory.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

def summarize_text_factory(text: str) -> str:
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a concise assistant."),
        ("user", "Summarize this text in one sentence: {text}")
    ])
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    chain = prompt | llm
    response = chain.invoke({"text": text})
    return response.content
# test_app_factory.py
from unittest.mock import MagicMock, patch
from langchain_core.messages import AIMessage
from app_factory import summarize_text_factory

def test_summarize_text_factory_mocks_llm_creation():
    fake_llm = MagicMock()
    fake_chain = MagicMock()
    fake_chain.invoke.return_value = AIMessage(content="Factory summary.")

    fake_llm.__or__.return_value = fake_chain

    with patch("app_factory.ChatOpenAI", return_value=fake_llm):
        result = summarize_text_factory("Some input text.")

    assert result == "Factory summary."
  1. Add one test that checks your prompt logic separately from the model call. This keeps failures readable: if the prompt changes, you’ll know it was your template, not the LLM response.
# test_prompt.py
from app import prompt

def test_prompt_formats_user_input():
    messages = prompt.format_messages(text="Hello world")

    assert len(messages) == 2
    assert messages[0].content == "You are a concise assistant."
    assert messages[1].content == "Summarize this text in one sentence: Hello world"

Testing It

Run your tests with pytest:

pytest -q

If everything is wired correctly, all tests should pass without requiring an API key or making any network requests. If a test fails, check whether you patched the right module path; in Python, you mock where the object is used, not where it originally came from. For LangChain code, that usually means patching app.chain.invoke or app.ChatOpenAI, not the library globally.

A good sanity check is to temporarily remove the mock and confirm that your code would try to call the real model again. That tells you your tests are actually isolating external dependencies instead of accidentally passing for the wrong reason.

Next Steps

  • Learn how to use pytest fixtures to share mocked LangChain objects across multiple tests.
  • Add contract tests for structured outputs using Pydantic models and mocked AIMessage responses.
  • Explore integration tests separately from unit tests so you can cover both prompt logic and real provider behavior.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides