LangChain Tutorial (TypeScript): adding cost tracking for advanced developers

By Cyprian AaronsUpdated 2026-04-21
langchainadding-cost-tracking-for-advanced-developerstypescript

This tutorial shows you how to add per-run cost tracking to a LangChain TypeScript app using real callbacks and model usage metadata. You need this when you want to answer basic questions like “what did this agent cost?” and more useful ones like “which prompt version is burning tokens in production?”

What You'll Need

  • Node.js 18+
  • TypeScript project with ts-node or tsx
  • langchain
  • @langchain/openai
  • OpenAI API key in OPENAI_API_KEY
  • Optional but recommended:
    • dotenv for local env loading
    • A logging sink or metrics backend for storing cost events

Install the packages:

npm install langchain @langchain/openai dotenv
npm install -D typescript tsx @types/node

Step-by-Step

  1. Start with a runnable LangChain chain that exposes token usage. The key requirement is that your model call returns usage metadata, because that is what you convert into cost.
import "dotenv/config";
import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";

const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0,
});

const prompt = ChatPromptTemplate.fromMessages([
  ["system", "You are a concise assistant."],
  ["human", "{question}"],
]);

const chain = prompt.pipe(model);

const result = await chain.invoke({
  question: "Summarize why token tracking matters in production.",
});

console.log(result.content);
console.log(result.response_metadata);
  1. Add a small cost calculator that maps token counts to dollars. In production, keep the rates in config so you can update them when pricing changes.
type Usage = {
  input_tokens?: number;
  output_tokens?: number;
  total_tokens?: number;
};

const PRICING = {
  inputPer1M: 0.15,
  outputPer1M: 0.6,
};

function calculateCost(usage: Usage) {
  const inputTokens = usage.input_tokens ?? 0;
  const outputTokens = usage.output_tokens ?? 0;

  const inputCost = (inputTokens / 1_000_000) * PRICING.inputPer1M;
  const outputCost = (outputTokens / 1_000_000) * PRICING.outputPer1M;

  return {
    inputTokens,
    outputTokens,
    totalTokens: usage.total_tokens ?? inputTokens + outputTokens,
    inputCost,
    outputCost,
    totalCost: inputCost + outputCost,
  };
}
  1. Wrap the chain invocation so every run emits structured usage and cost data. This is the pattern you want when you later send the event to Datadog, Postgres, Kafka, or your internal billing service.
import "dotenv/config";
import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";

const model = new ChatOpenAI({ model: "gpt-4o-mini", temperature: 0 });
const prompt = ChatPromptTemplate.fromMessages([
  ["system", "You are a concise assistant."],
  ["human", "{question}"],
]);

type Usage = {
  input_tokens?: number;
  output_tokens?: number;
  total_tokens?: number;
};

const PRICING = { inputPer1M: 0.15, outputPer1M: 0.6 };

function calculateCost(usage: Usage) {
  const inputTokens = usage.input_tokens ?? 0;
  const outputTokens = usage.output_tokens ?? 0;
  return {
    inputTokens,
    outputTokens,
    totalTokens: usage.total_tokens ?? inputTokens + outputTokens,
    totalCost:
      (inputTokens / 1_000_000) * PRICING.inputPer1M +
      (outputTokens / 1_000_000) * PRICING.outputPer1M,
  };
}

async function run(question: string) {
  const chain = prompt.pipe(model);
  const result = await chain.invoke({ question });

  const usage =
    (result.response_metadata?.tokenUsage as Usage | undefined) ??
    (result.response_metadata?.usage as Usage | undefined) ??
    {};

  const cost = calculateCost(usage);

  console.log(JSON.stringify({ question, answer: result.content, usage, cost }, null, 2));
}

await run("What is the point of cost tracking in LangChain?");
  1. If you are using agents or multiple LLM calls, attach a callback handler so you can capture every model call instead of just the final answer. This is the difference between “I know what the user paid” and “I know which step in the agent loop caused it.”
import { BaseCallbackHandler } from "@langchain/core/callbacks/base";

class CostTrackingHandler extends BaseCallbackHandler {
  name = "cost-tracking-handler";

 public async handleLLMEnd(output: any) {
    const generation = output.generations?.[0]?.[0];
    const usage =
      generation?.message?.response_metadata?.tokenUsage ??
      generation?.message?.response_metadata?.usage ??
      {};

    console.log("LLM ended:", JSON.stringify(usage));
 }
}
  1. Use the handler with your chain and verify each run prints both content and spend. For real systems, replace console.log with an event write so finance and product can query it later.
import "dotenv/config";
import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { CallbackManager } from "@langchain/core/callbacks/manager";

const model = new ChatOpenAI({
   model: "gpt-4o-mini",
   temperature: 0,
   callbackManager: CallbackManager.fromHandlers([]),
});

const prompt = ChatPromptTemplate.fromMessages([
   ["system", "You are a concise assistant."],
   ["human", "{question}"],
]);

async function main() {
   const result = await prompt.pipe(model).invoke(
     { question: "Give me one sentence on token budgets." },
     { callbacks: [new CostTrackingHandler()] }
   );

   console.log(result.content);
}

await main();

Testing It

Run the script with npx tsx src/index.ts or your equivalent entry point. You should see the assistant response plus token counts and a calculated dollar amount.

Check that input_tokens and output_tokens are non-zero after a real OpenAI call. If they are missing, inspect response_metadata because some model wrappers expose usage under slightly different keys depending on provider version.

Then run the same prompt twice and confirm costs are stable within normal variance. If they drift wildly, you probably changed models, prompts, or tool-calling behavior.

For agent workflows, confirm each sub-call emits its own event. That gives you per-step accounting instead of one blended number at the end.

Next Steps

  • Store cost events in Postgres with columns for run_id, model, input_tokens, output_tokens, and total_cost
  • Add budget guards that reject runs once a tenant crosses a daily spend threshold
  • Extend this pattern to tools and retrievers so you can track full workflow cost, not just LLM spend

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides