LlamaIndex Tutorial (TypeScript): adding cost tracking for advanced developers

By Cyprian AaronsUpdated 2026-04-21

llamaindexadding-cost-tracking-for-advanced-developerstypescript

This tutorial shows how to instrument a LlamaIndex TypeScript app so every LLM call is tracked with prompt, completion, and estimated dollar cost. You need this when you want per-request cost visibility for billing, budgeting, or just catching runaway agent loops before they hit your invoice.

What You'll Need

•Node.js 18+ installed
•A TypeScript project with ts-node or tsx
•llamaindex installed
•An OpenAI API key
•Basic familiarity with Settings, OpenAI, and query engines in LlamaIndex TypeScript
•A place to store logs or metrics, such as stdout, a database, or OpenTelemetry

Install the package:

npm install llamaindex

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

•Start by creating a small cost model for the provider you use. LlamaIndex does not give you billing numbers out of the box, so you should track token usage from callbacks and convert that into dollars yourself.

type CostRecord = {
  model: string;
  promptTokens: number;
  completionTokens: number;
  totalTokens: number;
  estimatedCostUsd: number;
};

const PRICING_PER_1M_TOKENS: Record<string, { input: number; output: number }> = {
  "gpt-4o-mini": { input: 0.15, output: 0.6 },
  "gpt-4o": { input: 5, output: 15 },
};

export function estimateCost(model: string, promptTokens: number, completionTokens: number) {
  const pricing = PRICING_PER_1M_TOKENS[model] ?? PRICING_PER_1M_TOKENS["gpt-4o-mini"];
  return (
    (promptTokens / 1_000_000) * pricing.input +
    (completionTokens / 1_000_000) * pricing.output
  );
}

•Wire up a callback handler that listens for LLM events and captures usage metadata. In production, this is where you’d emit metrics to your observability stack instead of just printing them.

import { CallbackManager } from "llamaindex";

export const costRecords: CostRecord[] = [];

export const callbackManager = new CallbackManager({
  onLLMEnd(event) {
    const model = String(event.payload?.model ?? "gpt-4o-mini");
    const promptTokens = Number(event.payload?.promptTokenCount ?? 0);
    const completionTokens = Number(event.payload?.completionTokenCount ?? 0);
    const totalTokens = promptTokens + completionTokens;

    costRecords.push({
      model,
      promptTokens,
      completionTokens,
      totalTokens,
      estimatedCostUsd: estimateCost(model, promptTokens, completionTokens),
    });
  },
});

•Configure LlamaIndex to use your callback manager and an OpenAI LLM explicitly. This keeps the setup deterministic and makes it easy to swap models while keeping the same cost-tracking layer.

import { Settings, OpenAI } from "llamaindex";

Settings.llm = new OpenAI({
  model: "gpt-4o-mini",
});

Settings.callbackManager = callbackManager;

•Build a minimal index and run one query so you can see token usage flow through the callback path. The important part is not the index itself; it’s proving that every request now produces a cost record.

import {
  Document,
  VectorStoreIndex,
} from "llamaindex";

async function main() {
  const documents = [
    new Document({ text: "Claims are paid after verification of policy coverage." }),
    new Document({ text: "Fraud checks may delay settlement if anomalies are detected." }),
  ];

  const index = await VectorStoreIndex.fromDocuments(documents);
  const queryEngine = index.asQueryEngine();

  const response = await queryEngine.query({
    query: "When can a claim be delayed?",
  });

  console.log(String(response));
}

main();

•Print a summary after the run so developers can inspect costs during local testing or CI runs. In a real service, replace this with structured logging keyed by request ID, tenant ID, or workflow ID.

setTimeout(() => {
  const totals = costRecords.reduce(
    (acc, record) => ({
      promptTokens: acc.promptTokens + record.promptTokens,
      completionTokens: acc.completionTokens + record.completionTokens,
      estimatedCostUsd: acc.estimatedCostUsd + record.estimatedCostUsd,
    }),
    { promptTokens: 0, completionTokens: 0, estimatedCostUsd: 0 }
  );

  console.log("Cost records:", costRecords);
  console.log("Totals:", totals);
}, 1000);

Testing It

Run the script and confirm you get both an answer from the query engine and at least one entry in costRecords. If your callback payloads are empty, check whether the provider is returning token usage in the shape your installed llamaindex version expects.

For a stronger test, run the same query twice and compare totals; they should increase predictably. Then swap gpt-4o-mini for another model in Settings.llm and verify your pricing table changes the estimated dollar amount.

If you want to validate this in CI, assert that:

•costRecords.length > 0
•estimatedCostUsd >= 0
•token counts are integers

Next Steps

•Add request-scoped correlation IDs so each cost record maps back to one user action
•Export these records to Prometheus, Datadog, or OpenTelemetry instead of console logs
•Extend the pricing table for embeddings and reranking if your pipeline uses them

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit