LangGraph Tutorial (TypeScript): optimizing token usage for beginners

By Cyprian AaronsUpdated 2026-04-21
langgraphoptimizing-token-usage-for-beginnerstypescript

This tutorial shows you how to build a LangGraph workflow in TypeScript that keeps token usage under control without making the agent brittle. You’ll use a small state model, targeted prompts, conditional routing, and response trimming so beginners can ship something that works and doesn’t burn through API budget.

What You'll Need

  • Node.js 18+ installed
  • A TypeScript project with ts-node or tsx
  • Packages:
    • @langchain/langgraph
    • @langchain/openai
    • @langchain/core
    • zod
  • An OpenAI API key in OPENAI_API_KEY
  • Basic familiarity with LangGraph state, nodes, and edges
  • A terminal you can run TypeScript from

Step-by-Step

  1. Start by defining a narrow state shape. Token waste usually starts when you keep passing around huge message arrays or unrelated metadata, so keep only what each step actually needs.
import { Annotation, StateGraph, START, END } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage, AIMessage } from "@langchain/core/messages";

const GraphState = Annotation.Root({
  question: Annotation<string>(),
  summary: Annotation<string | undefined>(),
  answer: Annotation<string | undefined>(),
  needsClarification: Annotation<boolean | undefined>(),
});

type GraphStateType = typeof GraphState.State;
  1. Create one model instance and force short outputs. For beginners, the easiest win is to cap output length and make the model work from a compact summary instead of the full conversation.
const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0,
  maxTokens: 180,
});

async function summarizeQuestion(state: GraphStateType) {
  const prompt = [
    "Summarize the user's question in one sentence.",
    "Keep only business-critical details.",
    `Question: ${state.question}`,
  ].join("\n");

  const result = await model.invoke(prompt);
  return { summary: result.content.toString() };
}
  1. Add a lightweight router node before any expensive generation. This avoids sending every request through the same long prompt path; simple requests get simple handling, and unclear requests get clarified early.
async function routeRequest(state: GraphStateType) {
  const prompt = [
    "Decide if this question is clear enough to answer.",
    "Return only 'yes' or 'no'.",
    `Question: ${state.question}`,
  ].join("\n");

  const result = await model.invoke(prompt);
  const text = result.content.toString().trim().toLowerCase();

  return {
    needsClarification: text !== "yes",
  };
}
  1. Build the answer node using the summarized input, not the raw question plus extra chat history. This is where most token savings happen in practice: fewer tokens in means fewer tokens out because the prompt stays tight.
async function answerQuestion(state: GraphStateType) {
  const prompt = [
    "Answer the user's request concisely.",
    "Use at most 5 bullet points.",
    state.summary ? `Summary: ${state.summary}` : `Question: ${state.question}`,
  ].join("\n");

  const result = await model.invoke(prompt);
  return { answer: result.content.toString() };
}

async function askForClarification(state: GraphStateType) {
  return {
    answer:
      "Please clarify your request with one sentence and include the exact output format you want.",
  };
}
  1. Wire the graph with conditional routing and compile it once. The graph should only call the expensive answer node when the request is clear; otherwise it stops early with a clarification response.
const graph = new StateGraph(GraphState)
  .addNode("routeRequest", routeRequest)
  .addNode("summarizeQuestion", summarizeQuestion)
  .addNode("answerQuestion", answerQuestion)
  .addNode("askForClarification", askForClarification)
  .addEdge(START, "routeRequest")
  .addConditionalEdges("routeRequest", (state) =>
    state.needsClarification ? "askForClarification" : "summarizeQuestion"
  )
  .addEdge("summarizeQuestion", "answerQuestion")
  .addEdge("answerQuestion", END)
  .addEdge("askForClarification", END);

const app = graph.compile();
  1. Run a test input and print only what matters. If you dump full internal state during debugging, you lose part of the savings you just built into the workflow.
async function main() {
  const result = await app.invoke({
    question:
      "Explain how to reduce token usage in LangGraph without breaking my TypeScript agent.",
    summary: undefined,
    answer: undefined,
    needsClarification: undefined,
  });

  console.log("Answer:\n", result.answer);
}

main().catch(console.error);

Testing It

Run the file with tsx or ts-node after exporting your API key. Try two inputs: one clear question and one vague question like “help me with LangGraph,” then compare whether the graph routes to clarification early.

Watch for three things:

  • The router should short-circuit unclear prompts.
  • The summarizer should produce a much smaller prompt payload than your original input.
  • The final answer should stay short because maxTokens is capped.

If you want to inspect token usage more closely, log request sizes at each node before calling the model. In production, that’s how you catch accidental prompt bloat before it becomes an invoice problem.

Next Steps

  • Add message trimming with a rolling window so long chats don’t accumulate forever.
  • Move routing to structured output with Zod so “yes/no” parsing is deterministic.
  • Add caching for repeated summaries or repeated classification prompts.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides