LlamaIndex Tutorial (TypeScript): mocking LLM calls in tests for intermediate developers
This tutorial shows how to write deterministic tests for LlamaIndex TypeScript code by mocking LLM calls instead of hitting a real model. You need this when your agent logic is solid but your tests are flaky, slow, or expensive because they depend on network calls and model output.
What You'll Need
- •Node.js 18+ and a TypeScript project
- •
llamaindexinstalled - •A test runner like
vitestorjest - •An API key only if you want to run the real LLM path, not for mocked tests
- •A basic LlamaIndex setup with one query engine or chat engine
Step-by-Step
- •Start with a small LlamaIndex service that uses an injected LLM. The key idea is to keep the production code clean and make the test replace the model with a fake implementation.
import {
Document,
IngestionPipeline,
SentenceSplitter,
VectorStoreIndex,
OpenAI,
} from "llamaindex";
export async function buildQueryEngine() {
const documents = [
new Document({ text: "Claims are approved after fraud checks." }),
];
const pipeline = new IngestionPipeline({
transformations: [new SentenceSplitter({ chunkSize: 512, chunkOverlap: 20 })],
});
const nodes = await pipeline.run({ documents });
const index = await VectorStoreIndex.fromNodes(nodes);
return index.asQueryEngine({
llm: new OpenAI({ model: "gpt-4o-mini" }),
});
}
- •Create a fake LLM that returns fixed responses. In TypeScript, the simplest stable pattern is to implement the same surface your code uses and keep the output predictable for assertions.
import { BaseLLM, ChatMessage } from "llamaindex";
export class FakeLLM extends BaseLLM {
constructor(private readonly responseText: string) {
super();
}
chat(_messages: ChatMessage[]) {
return Promise.resolve({
message: new ChatMessage({
role: "assistant",
content: this.responseText,
}),
});
}
}
- •Use the fake in your test instead of the real OpenAI client. This keeps your query engine behavior realistic while making the result fully deterministic.
import { Document, SentenceSplitter, VectorStoreIndex, IngestionPipeline } from "llamaindex";
import { describe, expect, it } from "vitest";
import { FakeLLM } from "./FakeLLM";
describe("query engine", () => {
it("returns a mocked answer", async () => {
const documents = [
new Document({ text: "Claims are approved after fraud checks." }),
];
const pipeline = new IngestionPipeline({
transformations: [new SentenceSplitter({ chunkSize: 512, chunkOverlap: 20 })],
});
const nodes = await pipeline.run({ documents });
const index = await VectorStoreIndex.fromNodes(nodes);
const queryEngine = index.asQueryEngine({
llm: new FakeLLM("Claims are approved after fraud checks."),
});
const response = await queryEngine.query({
query: "When are claims approved?",
});
expect(String(response)).toContain("fraud checks");
});
});
- •If you want to mock at the module level instead of injecting a fake instance, use your test runner’s mocking tools. This is useful when legacy code constructs
new OpenAI()internally and you cannot refactor immediately.
import { vi, describe, it, expect } from "vitest";
import { OpenAI } from "llamaindex";
vi.mock("llamaindex", async () => {
const actual = await vi.importActual<typeof import("llamaindex")>("llamaindex");
return {
...actual,
OpenAI: class extends actual.OpenAI {
chat() {
return Promise.resolve({
message: { role: "assistant", content: "mocked response" },
});
}
},
};
});
describe("module mock", () => {
it("mocks OpenAI globally", async () => {
const llm = new OpenAI({ model: "gpt-4o-mini" });
const result = await llm.chat([]);
expect(result.message.content).toBe("mocked response");
});
});
- •Keep your assertions on structure and intent, not exact prose from an LLM. For agent workflows, verify tool selection, citation presence, routing decisions, or JSON shape rather than sentence-by-sentence wording.
import { describe, expect, it } from "vitest";
describe("assertion strategy", () => {
it("checks contract instead of wording", () => {
const output = JSON.parse(
JSON.stringify({
decision: "approve",
reason: "policy matched",
confidence: 0.91,
})
);
expect(output.decision).toBe("approve");
expect(output.confidence).toBeGreaterThan(0.8);
expect(output.reason).toContain("policy");
});
});
Testing It
Run your test suite with vitest or jest and confirm that no network calls happen during execution. If you see intermittent failures or token usage in logs, your code is still reaching a real model somewhere.
A good sanity check is to change the fake response text and verify the assertion fails exactly where expected. That tells you the test is actually exercising the mocked path and not bypassing it.
For integration coverage, keep one separate test file that uses a real API key and mark it as slow or optional. That gives you confidence in provider wiring without making every unit test depend on live inference.
Next Steps
- •Add tool-call mocking for agents that use
FunctionToolor custom retrievers - •Learn how to snapshot structured outputs from LlamaIndex response synthesizers
- •Split tests into unit tests with fakes and contract tests with a real model
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit