LlamaIndex Tutorial (Python): mocking LLM calls in tests for intermediate developers
This tutorial shows how to write Python tests for LlamaIndex code without calling a real LLM. You need this when you want fast, deterministic tests that don’t burn tokens, hit rate limits, or fail because a model changed its output.
What You'll Need
- •Python 3.10+
- •
llama-index - •
pytest - •
unittest.mockfrom the standard library - •Optional:
openaiif you want to compare against a real model later - •No API key is required for the mocked test path
Install the packages:
pip install llama-index pytest
Step-by-Step
- •Start with a small function that uses LlamaIndex’s chat engine. The point is to isolate the part you want to test: your application logic, not the provider.
from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.core.llms.mock import MockLLM
from llama_index.core.chat_engine.types import ChatMode
from llama_index.core import VectorStoreIndex, Document
def build_chat_engine():
docs = [Document(text="LlamaIndex helps build RAG apps.")]
index = VectorStoreIndex.from_documents(docs)
return index.as_chat_engine(chat_mode=ChatMode.CONTEXT)
def ask_engine(engine, question: str) -> str:
response = engine.chat(question)
return str(response)
- •Mock the LLM at the model layer so no network call happens.
MockLLMreturns predictable output and is ideal for unit tests where you only care that your code wires things correctly.
from llama_index.core.llms.mock import MockLLM
def build_mock_llm():
return MockLLM(max_tokens=32)
def format_answer(llm) -> str:
messages = [
{"role": "user", "content": "Summarize this in one line."}
]
response = llm.chat(messages)
return response.message.content
- •If your code builds an index and query engine, inject the mock through
Settings. This keeps your production code clean and makes the test swap explicit.
from llama_index.core import Settings, VectorStoreIndex, Document
from llama_index.core.llms.mock import MockLLM
def build_query_engine():
Settings.llm = MockLLM(max_tokens=64)
docs = [Document(text="Insurance claims can be automated with document extraction.")]
index = VectorStoreIndex.from_documents(docs)
return index.as_query_engine()
def run_query():
query_engine = build_query_engine()
response = query_engine.query("What can be automated?")
return str(response)
- •Write a pytest that patches the exact call site if you want full control over the response. This is useful when your production code calls helper functions and you want to avoid touching global settings in tests.
import pytest
from unittest.mock import patch
from llama_index.core.llms.mock import MockLLM
def get_summary(llm):
return llm.complete("Write a short summary of claims triage.")
def test_get_summary_with_mock():
llm = MockLLM(max_tokens=16)
result = get_summary(llm)
assert result.text is not None
assert len(result.text) > 0
@patch("llama_index.core.settings.Settings.llm", new_callable=lambda: MockLLM(max_tokens=8))
def test_patch_settings_llm(_mock_llm):
assert _mock_llm.max_tokens == 8
- •Run an end-to-end unit test around your own wrapper function. Keep the assertion on behavior, not on a long generated string, because mock outputs are meant to be stable but still implementation-specific.
from llama_index.core.llms.mock import MockLLM
class AnswerService:
def __init__(self, llm):
self.llm = llm
def answer(self, prompt: str) -> str:
result = self.llm.complete(prompt)
return result.text.strip()
def test_answer_service():
service = AnswerService(MockLLM(max_tokens=20))
answer = service.answer("Explain policy exclusions.")
assert isinstance(answer, str)
assert answer != ""
Testing It
Run pytest -q from your project root. Your tests should pass without any OpenAI key or external network access.
If you want to confirm there are no live calls, temporarily disconnect from the network or unset all provider credentials before running tests. A good unit test suite should still pass because it only depends on MockLLM and local code paths.
If a test starts failing only when output text changes, tighten the assertion. Check types, presence of fields, or structural behavior instead of exact prose.
Next Steps
- •Learn how to use
Settings.callback_managerto trace prompt/response flow in tests. - •Add integration tests that hit a real provider behind an environment flag.
- •Move from
MockLLMto custom fake LLM classes when you need scenario-specific responses like refusals or malformed JSON.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit