LlamaIndex Tutorial (Python): mocking LLM calls in tests for advanced developers
This tutorial shows how to make LlamaIndex tests deterministic by mocking LLM calls in Python. You need this when your agent logic is solid, but your tests are flaky, slow, or expensive because they hit real model endpoints.
What You'll Need
- •Python 3.10+
- •
llama-index - •
pytest - •A test runner like
pytest -q - •No API key is required for the mocked tests in this tutorial
- •Optional:
openaiif you want to compare mocked tests with real model calls later
Step-by-Step
- •Start with a small LlamaIndex component that normally calls an LLM. The point here is not to build a full app; it’s to isolate the unit under test so you can replace the model call cleanly.
from llama_index.core.llms import MockLLM
from llama_index.core.prompts import PromptTemplate
def summarize_text(llm, text: str) -> str:
prompt = PromptTemplate("Summarize this in one sentence: {text}")
response = llm.complete(prompt.format(text=text))
return response.text
if __name__ == "__main__":
llm = MockLLM()
print(summarize_text(llm, "LlamaIndex makes retrieval pipelines easier to build."))
- •Use
MockLLMfor deterministic responses. In LlamaIndex, this is the cleanest way to avoid network calls while still exercising your prompt formatting and downstream logic.
from llama_index.core.llms import MockLLM
from llama_index.core.prompts import PromptTemplate
def summarize_text(llm, text: str) -> str:
prompt = PromptTemplate("Summarize this in one sentence: {text}")
response = llm.complete(prompt.format(text=text))
return response.text
mock_llm = MockLLM()
result = summarize_text(mock_llm, "LlamaIndex makes retrieval pipelines easier to build.")
print(result)
- •For more advanced tests, use a custom mock that returns specific outputs based on the prompt. This is useful when you need different branches of business logic to execute without depending on token-level sampling or provider behavior.
from llama_index.core.base.llms.types import CompletionResponse, LLMMetadata
from llama_index.core.llms.custom import CustomLLM
from llama_index.core.prompts import PromptTemplate
class RuleBasedMockLLM(CustomLLM):
@property
def metadata(self) -> LLMMetadata:
return LLMMetadata(model_name="rule-based-mock", context_window=4096, num_output=256)
def complete(self, prompt: str, **kwargs) -> CompletionResponse:
if "refund" in prompt.lower():
return CompletionResponse(text="Escalate to billing support.")
return CompletionResponse(text="Handle via standard workflow.")
def route_request(llm, issue: str) -> str:
prompt = PromptTemplate("Classify this issue and respond: {issue}")
response = llm.complete(prompt.format(issue=issue))
return response.text
if __name__ == "__main__":
llm = RuleBasedMockLLM()
print(route_request(llm, "Customer requests a refund for duplicate charge"))
- •Wire the mock into a real test using
pytest. Keep the test focused on your application code, not on LlamaIndex internals.
from llama_index.core.base.llms.types import CompletionResponse, LLMMetadata
from llama_index.core.llms.custom import CustomLLM
from llama_index.core.prompts import PromptTemplate
class RuleBasedMockLLM(CustomLLM):
@property
def metadata(self) -> LLMMetadata:
return LLMMetadata(model_name="rule-based-mock", context_window=4096, num_output=256)
def complete(self, prompt: str, **kwargs) -> CompletionResponse:
if "refund" in prompt.lower():
return CompletionResponse(text="Escalate to billing support.")
return CompletionResponse(text="Handle via standard workflow.")
def route_request(llm, issue: str) -> str:
prompt = PromptTemplate("Classify this issue and respond: {issue}")
response = llm.complete(prompt.format(issue=issue))
return response.text
def test_refund_routes_to_billing():
llm = RuleBasedMockLLM()
assert route_request(llm, "Customer requests a refund") == "Escalate to billing support."
- •If your code uses chat-style calls instead of completions, mock those too. The pattern stays the same: keep the interface stable and make the output predictable so your assertions stay stable.
from llama_index.core.base.llms.types import ChatMessage, ChatResponse, MessageRole
from llama_index.core.llms.custom import CustomLLM
class ChatMockLLM(CustomLLM):
@property
def metadata(self):
from llama_index.core.base.llms.types import LLMMetadata
return LLMMetadata(model_name="chat-mock", context_window=4096, num_output=256)
def chat(self, messages, **kwargs):
last_user_message = next(m.content for m in reversed(messages) if m.role == MessageRole.USER)
if "claim status" in last_user_message.lower():
return ChatResponse(message=ChatMessage(role=MessageRole.ASSISTANT, content="Claim is pending review."))
return ChatResponse(message=ChatMessage(role=MessageRole.ASSISTANT, content="No action needed."))
Testing It
Run your tests with pytest -q and confirm they pass without any network access or API keys. If you change the prompt text or routing logic and the tests still pass incorrectly, your mock is too broad and needs tighter branching rules.
A good check is to add one positive case and one negative case per branch. That catches accidental regressions in prompt formatting and decision logic before they reach production.
If you’re using agents or query engines on top of these mocks, verify that only your app code changes between test runs. The whole point is repeatability: same input, same output, every time.
Next Steps
- •Mock streaming responses with
stream_completeandstream_chatfor agent UIs that render tokens incrementally. - •Add fixture-based mocks in
pytestso multiple tests reuse the same deterministic LLM behavior. - •Test retrievers separately from generators by mocking only the generation layer first.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit