LangGraph Tutorial (TypeScript): mocking LLM calls in tests for advanced developers
This tutorial shows how to make LangGraph tests deterministic by mocking LLM calls at the model boundary in TypeScript. You need this when your graph logic is solid, but your tests are flaky because real model responses vary, cost money, or depend on network access.
What You'll Need
- •Node.js 18+
- •TypeScript 5+
- •
@langchain/langgraph - •
@langchain/core - •
jestorvitest - •A test runner configured for ESM or CommonJS consistently
- •No API key required for the mocked tests
- •Optional:
OPENAI_API_KEYif you want to compare mocked vs real runs later
Step-by-Step
- •Start with a small graph that calls an LLM through a dependency you control.
The key is not mocking LangGraph itself; it is injecting a fake chat model into the node so the graph stays real while the model call is isolated.
import { StateGraph, Annotation, START, END } from "@langchain/langgraph";
import { BaseChatModel } from "@langchain/core/language_models/chat_models";
import { AIMessage, HumanMessage, BaseMessage } from "@langchain/core/messages";
const State = Annotation.Root({
messages: Annotation<BaseMessage[]>({
reducer: (left, right) => left.concat(right),
default: () => [],
}),
answer: Annotation<string>({
reducer: (_left, right) => right,
default: () => "",
}),
});
type GraphState = typeof State.State;
function buildGraph(model: BaseChatModel) {
const app = new StateGraph(State)
.addNode("llm", async (state: GraphState) => {
const result = await model.invoke(state.messages);
return { answer: result.content.toString(), messages: [result] };
})
.addEdge(START, "llm")
.addEdge("llm", END)
.compile();
return app;
}
- •Create a fake chat model that returns fixed outputs for known inputs.
This keeps your tests fast and lets you assert exact strings instead of fuzzy semantics.
class MockChatModel extends BaseChatModel {
_llmType() {
return "mock-chat-model";
}
async _generate(messages: BaseMessage[]) {
const last = messages[messages.length - 1];
const input = last?.content?.toString() ?? "";
const content =
input.includes("refund") ? "Escalate to support" : "Approve automatically";
return {
generations: [
[
{
text: content,
message: new AIMessage(content),
},
],
],
llmOutput: {},
} as any;
}
}
- •Write a test that runs the actual graph and asserts on state output.
The important part is that your node logic, reducers, and edge wiring are exercised end to end.
import { describe, it, expect } from "vitest";
import { HumanMessage } from "@langchain/core/messages";
describe("LangGraph with mocked LLM", () => {
it("returns a deterministic answer for refund requests", async () => {
const graph = buildGraph(new MockChatModel());
const result = await graph.invoke({
messages: [new HumanMessage("Customer asks for a refund")],
answer: "",
});
expect(result.answer).toBe("Escalate to support");
expect(result.messages[result.messages.length - 1].content.toString()).toBe(
"Escalate to support"
);
});
it("returns the default answer for non-refund requests", async () => {
const graph = buildGraph(new MockChatModel());
const result = await graph.invoke({
messages: [new HumanMessage("Can I update my address?")],
answer: "",
});
expect(result.answer).toBe("Approve automatically");
});
});
- •If you want stricter control over prompts, mock at the function boundary instead of subclassing the model.
This is useful when your production code already wraps the LLM call in a helper and you want per-test behavior without touching LangChain internals.
export async function classifyRequest(
input: string,
callModel: (messages: BaseMessage[]) => Promise<string>
) {
const app = new StateGraph(State)
.addNode("classify", async (state: GraphState) => {
const answer = await callModel(state.messages);
return { answer };
})
.addEdge(START, "classify")
.addEdge("classify", END)
.compile();
return app.invoke({
messages: [new HumanMessage(input)],
answer: "",
});
}
- •Use spies for integration-style tests when you want to verify prompt shape and call count.
This catches regressions where someone changes the system prompt or accidentally calls the model twice.
import { vi, describe, it, expect } from "vitest";
describe("call boundary spy", () => {
it("calls the model once with one human message", async () => {
const callModel = vi.fn(async (messages) => {
expect(messages).toHaveLength(1);
return "Approved";
});
const result = await classifyRequest("Need account help", callModel);
expect(callModel).toHaveBeenCalledTimes(1);
expect(result.answer).toBe("Approved");
});
});
Testing It
Run your test suite normally with vitest or jest, and confirm there are no network calls or API key dependencies. The output should be stable across repeated runs because the mock returns fixed values for known inputs. If you see flaky assertions, check that your state reducer is deterministic and that you are not leaking real model calls through another node. For graphs with multiple nodes, keep one mock per external dependency so failures point to a single boundary.
Next Steps
- •Add table-driven tests for multiple intents and edge cases
- •Mock streaming responses if your production graph uses token streaming
- •Introduce contract tests that compare mocked outputs against a small golden set from real LLM runs
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit