LlamaIndex Tutorial (Python): mocking LLM calls in tests for intermediate developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexmocking-llm-calls-in-tests-for-intermediate-developerspython

This tutorial shows how to write Python tests for LlamaIndex code without calling a real LLM. You need this when you want fast, deterministic tests that don’t burn tokens, hit rate limits, or fail because a model changed its output.

What You'll Need

  • Python 3.10+
  • llama-index
  • pytest
  • unittest.mock from the standard library
  • Optional: openai if you want to compare against a real model later
  • No API key is required for the mocked test path

Install the packages:

pip install llama-index pytest

Step-by-Step

  1. Start with a small function that uses LlamaIndex’s chat engine. The point is to isolate the part you want to test: your application logic, not the provider.
from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.core.llms.mock import MockLLM
from llama_index.core.chat_engine.types import ChatMode
from llama_index.core import VectorStoreIndex, Document


def build_chat_engine():
    docs = [Document(text="LlamaIndex helps build RAG apps.")]
    index = VectorStoreIndex.from_documents(docs)
    return index.as_chat_engine(chat_mode=ChatMode.CONTEXT)


def ask_engine(engine, question: str) -> str:
    response = engine.chat(question)
    return str(response)
  1. Mock the LLM at the model layer so no network call happens. MockLLM returns predictable output and is ideal for unit tests where you only care that your code wires things correctly.
from llama_index.core.llms.mock import MockLLM


def build_mock_llm():
    return MockLLM(max_tokens=32)


def format_answer(llm) -> str:
    messages = [
        {"role": "user", "content": "Summarize this in one line."}
    ]
    response = llm.chat(messages)
    return response.message.content
  1. If your code builds an index and query engine, inject the mock through Settings. This keeps your production code clean and makes the test swap explicit.
from llama_index.core import Settings, VectorStoreIndex, Document
from llama_index.core.llms.mock import MockLLM


def build_query_engine():
    Settings.llm = MockLLM(max_tokens=64)
    docs = [Document(text="Insurance claims can be automated with document extraction.")]
    index = VectorStoreIndex.from_documents(docs)
    return index.as_query_engine()


def run_query():
    query_engine = build_query_engine()
    response = query_engine.query("What can be automated?")
    return str(response)
  1. Write a pytest that patches the exact call site if you want full control over the response. This is useful when your production code calls helper functions and you want to avoid touching global settings in tests.
import pytest
from unittest.mock import patch

from llama_index.core.llms.mock import MockLLM


def get_summary(llm):
    return llm.complete("Write a short summary of claims triage.")


def test_get_summary_with_mock():
    llm = MockLLM(max_tokens=16)
    result = get_summary(llm)
    assert result.text is not None
    assert len(result.text) > 0


@patch("llama_index.core.settings.Settings.llm", new_callable=lambda: MockLLM(max_tokens=8))
def test_patch_settings_llm(_mock_llm):
    assert _mock_llm.max_tokens == 8
  1. Run an end-to-end unit test around your own wrapper function. Keep the assertion on behavior, not on a long generated string, because mock outputs are meant to be stable but still implementation-specific.
from llama_index.core.llms.mock import MockLLM


class AnswerService:
    def __init__(self, llm):
        self.llm = llm

    def answer(self, prompt: str) -> str:
        result = self.llm.complete(prompt)
        return result.text.strip()


def test_answer_service():
    service = AnswerService(MockLLM(max_tokens=20))
    answer = service.answer("Explain policy exclusions.")
    assert isinstance(answer, str)
    assert answer != ""

Testing It

Run pytest -q from your project root. Your tests should pass without any OpenAI key or external network access.

If you want to confirm there are no live calls, temporarily disconnect from the network or unset all provider credentials before running tests. A good unit test suite should still pass because it only depends on MockLLM and local code paths.

If a test starts failing only when output text changes, tighten the assertion. Check types, presence of fields, or structural behavior instead of exact prose.

Next Steps

  • Learn how to use Settings.callback_manager to trace prompt/response flow in tests.
  • Add integration tests that hit a real provider behind an environment flag.
  • Move from MockLLM to custom fake LLM classes when you need scenario-specific responses like refusals or malformed JSON.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides