LlamaIndex Tutorial (Python): mocking LLM calls in tests for beginners

By Cyprian AaronsUpdated 2026-04-21

llamaindexmocking-llm-calls-in-tests-for-beginnerspython

This tutorial shows you how to stop LlamaIndex from calling a real LLM during tests and replace it with a predictable mock response. You need this when your unit tests should be fast, offline, and stable even if the model provider is down or the prompt changes.

What You'll Need

•Python 3.10+
•llama-index
•pytest
•No API key required for this tutorial
•A basic LlamaIndex project with at least one query engine or chat pipeline

Install the packages:

pip install llama-index pytest

Step-by-Step

•Create a tiny index that you can query in tests.

Use a small in-memory document so the test stays deterministic. The important part is that we will swap out the LLM later, not the data source.

from llama_index.core import VectorStoreIndex, Document

docs = [
    Document(text="Paris is the capital of France."),
    Document(text="Berlin is the capital of Germany."),
]

index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()

•Add a mock LLM that returns fixed output.

LlamaIndex lets you inject an LLM object into settings. For tests, use MockLLM so your code never hits a real provider.

from llama_index.core import Settings
from llama_index.core.llms.mock import MockLLM

Settings.llm = MockLLM(max_tokens=256)

response = Settings.llm.complete("What is the capital of France?")
print(response.text)

•Wire the mock into an actual query engine test.

This is the part most beginners miss: set Settings.llm before building the index or query engine. That way, any internal synthesis step uses the mock instead of OpenAI or another backend.

from llama_index.core import Settings, VectorStoreIndex, Document
from llama_index.core.llms.mock import MockLLM

Settings.llm = MockLLM(max_tokens=256)

docs = [Document(text="Paris is the capital of France.")]
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()

response = query_engine.query("What is the capital of France?")
print(type(response))
print(response.response)

•Assert against deterministic output in pytest.

For unit tests, assert on structure and stable text, not on model creativity. MockLLM gives you repeatable behavior, which makes failures easier to debug.

import pytest
from llama_index.core import Settings, VectorStoreIndex, Document
from llama_index.core.llms.mock import MockLLM

def test_query_engine_uses_mock_llm():
    Settings.llm = MockLLM(max_tokens=256)

    docs = [Document(text="Paris is the capital of France.")]
    index = VectorStoreIndex.from_documents(docs)
    query_engine = index.as_query_engine()

    response = query_engine.query("What is the capital of France?")

    assert response.response is not None
    assert isinstance(response.response, str)

•If you need stricter control, patch your app code at the boundary.

In production code, wrap LlamaIndex creation in a function so tests can replace dependencies cleanly. This keeps your test from depending on global state longer than necessary.

from llama_index.core import Settings, VectorStoreIndex, Document
from llama_index.core.llms.mock import MockLLM

def build_query_engine():
    Settings.llm = MockLLM(max_tokens=256)
    docs = [Document(text="Tokyo is the capital of Japan.")]
    index = VectorStoreIndex.from_documents(docs)
    return index.as_query_engine()

if __name__ == "__main__":
    engine = build_query_engine()
    print(engine.query("What is the capital of Japan?").response)

Testing It

Run your test suite with pytest -q. If everything is wired correctly, the test should pass without any network calls and without needing an API key.

A good sanity check is to temporarily disconnect from the internet and rerun the test. If it still passes, your LLM dependency is mocked correctly.

If you want to confirm that no real provider is being used, search your test output for SDK-specific warnings or auth errors. Those usually mean something in your setup still points to a live model client.

Next Steps

•Learn how to mock retrieval separately from generation so you can isolate vector search bugs.
•Move from Settings.llm globals to dependency injection for cleaner test boundaries.
•Add fixture-based tests for multi-step workflows like agents and tool calling.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit