LangChain Tutorial (TypeScript): optimizing token usage for beginners
This tutorial shows you how to reduce token usage in a TypeScript LangChain app without breaking the chain flow. You need this when your prompts are bloated, your chat history keeps growing, or your OpenAI bill starts climbing for no good reason.
What You'll Need
- •Node.js 18+
- •TypeScript 5+
- •An OpenAI API key
- •Packages:
- •
langchain - •
@langchain/openai - •
@langchain/core - •
dotenv
- •
- •A basic LangChain setup already working in TypeScript
- •A project using ES modules or a TS config that supports them
Install the dependencies:
npm install langchain @langchain/openai @langchain/core dotenv
Step-by-Step
- •Start by loading your API key and creating a model with a lower-cost default. For token optimization, the first win is usually model choice plus strict prompt control.
import "dotenv/config";
import { ChatOpenAI } from "@langchain/openai";
const model = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0,
});
async function main() {
const response = await model.invoke("Say hello in one sentence.");
console.log(response.content);
}
main();
- •Replace long free-form prompts with compact templates. If your instructions are verbose, every request pays for that overhead again.
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { ChatOpenAI } from "@langchain/openai";
import "dotenv/config";
const model = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0,
});
const prompt = ChatPromptTemplate.fromMessages([
["system", "Answer briefly. Use max 3 bullets."],
["human", "Summarize this text: {text}"],
]);
async function main() {
const chain = prompt.pipe(model);
const result = await chain.invoke({
text: "LangChain helps build LLM apps, but long prompts can waste tokens.",
});
console.log(result.content);
}
main();
- •Trim chat history before sending it back into the model. Beginners often pass the full conversation every time, which is the fastest way to waste tokens.
type Message = {
role: "user" | "assistant";
content: string;
};
function keepLastMessages(messages: Message[], maxMessages: number): Message[] {
return messages.slice(-maxMessages);
}
const history: Message[] = [
{ role: "user", content: "Explain RAG." },
{ role: "assistant", content: "RAG combines retrieval and generation." },
{ role: "user", content: "Now explain it simply." },
];
const trimmedHistory = keepLastMessages(history, 2);
console.log(trimmedHistory);
- •Use structured output instead of asking for long prose when you only need fields. Shorter outputs mean fewer completion tokens, and structured responses are easier to validate downstream.
import { z } from "zod";
import { StructuredOutputParser } from "@langchain/core/output_parsers";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { ChatOpenAI } from "@langchain/openai";
import "dotenv/config";
const parser = StructuredOutputParser.fromZodSchema(
z.object({
summary: z.string(),
riskLevel: z.enum(["low", "medium", "high"]),
})
);
const prompt = ChatPromptTemplate.fromTemplate(
`Return JSON only.\n{format_instructions}\nText: {text}`
);
const model = new ChatOpenAI({ model: "gpt-4o-mini", temperature: 0 });
async function main() {
const chain = prompt.pipe(model).pipe(parser);
const result = await chain.invoke({
text: "This policy document contains several exceptions and ambiguous clauses.",
format_instructions: parser.getFormatInstructions(),
});
console.log(result);
}
main();
- •Put hard limits on generation when you do not need long answers. In production, max output tokens is one of the simplest controls you can apply to stop runaway completions.
import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import "dotenv/config";
const model = new ChatOpenAI({
model: "gpt-4o-mini",
});
const prompt = ChatPromptTemplate.fromMessages([
["system", "Answer in one sentence."],
["human", "{question}"],
]);
async function main() {
const chain = prompt.pipe(
model.bind({
maxTokensToSample: undefined,
max_tokens: undefined,
maxCompletionTokens: undefined,
maxTokens: undefined,
})
);
const result = await chain.invoke({
question: "What does token optimization mean?",
options?: undefined,
// keep inputs minimal
format_instructions?: undefined,
// note: control output length via model config below if supported by your provider wrapper
});
console.log(result.content);
}
main();
A more practical pattern is to set the limit directly on the model constructor when your provider supports it:
import { ChatOpenAI } from "@langchain/openai";
const limitedModel = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0,
});
console.log(limitedModel);
Testing It
Run each script and compare token-heavy behavior against the optimized version. The easiest signal is shorter prompts, shorter outputs, and less chat history being sent per request.
If you want proof, log the actual messages before invoking the chain and inspect their length. You can also compare response times and API usage in your OpenAI dashboard before and after trimming prompts.
For a real check, add one test case with a long conversation and confirm only the last few messages are kept. Then verify structured output still returns valid JSON-like data instead of verbose prose.
Next Steps
- •Learn LangChain memory strategies so you can store state without resending everything every turn
- •Add prompt compression or summarization for long-running conversations
- •Measure token usage per request in logs so you can catch regressions early
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit