LlamaIndex Tutorial (Python): mocking LLM calls in tests for advanced developers

By Cyprian AaronsUpdated 2026-04-21
llamaindexmocking-llm-calls-in-tests-for-advanced-developerspython

This tutorial shows how to make LlamaIndex tests deterministic by mocking LLM calls in Python. You need this when your agent logic is solid, but your tests are flaky, slow, or expensive because they hit real model endpoints.

What You'll Need

  • Python 3.10+
  • llama-index
  • pytest
  • A test runner like pytest -q
  • No API key is required for the mocked tests in this tutorial
  • Optional: openai if you want to compare mocked tests with real model calls later

Step-by-Step

  1. Start with a small LlamaIndex component that normally calls an LLM. The point here is not to build a full app; it’s to isolate the unit under test so you can replace the model call cleanly.
from llama_index.core.llms import MockLLM
from llama_index.core.prompts import PromptTemplate


def summarize_text(llm, text: str) -> str:
    prompt = PromptTemplate("Summarize this in one sentence: {text}")
    response = llm.complete(prompt.format(text=text))
    return response.text


if __name__ == "__main__":
    llm = MockLLM()
    print(summarize_text(llm, "LlamaIndex makes retrieval pipelines easier to build."))
  1. Use MockLLM for deterministic responses. In LlamaIndex, this is the cleanest way to avoid network calls while still exercising your prompt formatting and downstream logic.
from llama_index.core.llms import MockLLM
from llama_index.core.prompts import PromptTemplate


def summarize_text(llm, text: str) -> str:
    prompt = PromptTemplate("Summarize this in one sentence: {text}")
    response = llm.complete(prompt.format(text=text))
    return response.text


mock_llm = MockLLM()
result = summarize_text(mock_llm, "LlamaIndex makes retrieval pipelines easier to build.")
print(result)
  1. For more advanced tests, use a custom mock that returns specific outputs based on the prompt. This is useful when you need different branches of business logic to execute without depending on token-level sampling or provider behavior.
from llama_index.core.base.llms.types import CompletionResponse, LLMMetadata
from llama_index.core.llms.custom import CustomLLM
from llama_index.core.prompts import PromptTemplate


class RuleBasedMockLLM(CustomLLM):
    @property
    def metadata(self) -> LLMMetadata:
        return LLMMetadata(model_name="rule-based-mock", context_window=4096, num_output=256)

    def complete(self, prompt: str, **kwargs) -> CompletionResponse:
        if "refund" in prompt.lower():
            return CompletionResponse(text="Escalate to billing support.")
        return CompletionResponse(text="Handle via standard workflow.")


def route_request(llm, issue: str) -> str:
    prompt = PromptTemplate("Classify this issue and respond: {issue}")
    response = llm.complete(prompt.format(issue=issue))
    return response.text


if __name__ == "__main__":
    llm = RuleBasedMockLLM()
    print(route_request(llm, "Customer requests a refund for duplicate charge"))
  1. Wire the mock into a real test using pytest. Keep the test focused on your application code, not on LlamaIndex internals.
from llama_index.core.base.llms.types import CompletionResponse, LLMMetadata
from llama_index.core.llms.custom import CustomLLM
from llama_index.core.prompts import PromptTemplate


class RuleBasedMockLLM(CustomLLM):
    @property
    def metadata(self) -> LLMMetadata:
        return LLMMetadata(model_name="rule-based-mock", context_window=4096, num_output=256)

    def complete(self, prompt: str, **kwargs) -> CompletionResponse:
        if "refund" in prompt.lower():
            return CompletionResponse(text="Escalate to billing support.")
        return CompletionResponse(text="Handle via standard workflow.")


def route_request(llm, issue: str) -> str:
    prompt = PromptTemplate("Classify this issue and respond: {issue}")
    response = llm.complete(prompt.format(issue=issue))
    return response.text


def test_refund_routes_to_billing():
    llm = RuleBasedMockLLM()
    assert route_request(llm, "Customer requests a refund") == "Escalate to billing support."
  1. If your code uses chat-style calls instead of completions, mock those too. The pattern stays the same: keep the interface stable and make the output predictable so your assertions stay stable.
from llama_index.core.base.llms.types import ChatMessage, ChatResponse, MessageRole
from llama_index.core.llms.custom import CustomLLM


class ChatMockLLM(CustomLLM):
    @property
    def metadata(self):
        from llama_index.core.base.llms.types import LLMMetadata

        return LLMMetadata(model_name="chat-mock", context_window=4096, num_output=256)

    def chat(self, messages, **kwargs):
        last_user_message = next(m.content for m in reversed(messages) if m.role == MessageRole.USER)
        if "claim status" in last_user_message.lower():
            return ChatResponse(message=ChatMessage(role=MessageRole.ASSISTANT, content="Claim is pending review."))
        return ChatResponse(message=ChatMessage(role=MessageRole.ASSISTANT, content="No action needed."))

Testing It

Run your tests with pytest -q and confirm they pass without any network access or API keys. If you change the prompt text or routing logic and the tests still pass incorrectly, your mock is too broad and needs tighter branching rules.

A good check is to add one positive case and one negative case per branch. That catches accidental regressions in prompt formatting and decision logic before they reach production.

If you’re using agents or query engines on top of these mocks, verify that only your app code changes between test runs. The whole point is repeatability: same input, same output, every time.

Next Steps

  • Mock streaming responses with stream_complete and stream_chat for agent UIs that render tokens incrementally.
  • Add fixture-based mocks in pytest so multiple tests reuse the same deterministic LLM behavior.
  • Test retrievers separately from generators by mocking only the generation layer first.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides