LlamaIndex Tutorial (TypeScript): adding cost tracking for beginners

By Cyprian AaronsUpdated 2026-04-21
llamaindexadding-cost-tracking-for-beginnerstypescript

This tutorial shows you how to add per-request cost tracking to a TypeScript LlamaIndex app, so you can see what each LLM call is costing you in real numbers. You need this when you’re building anything that can grow past a demo, because token usage without cost visibility turns into surprise bills fast.

What You'll Need

  • Node.js 18+
  • A TypeScript project with ts-node or a build step
  • llamaindex installed
  • An OpenAI API key
  • Basic familiarity with LlamaIndex query engines and chat engines

Install the package if you haven’t already:

npm install llamaindex

Set your API key:

export OPENAI_API_KEY="your-key-here"

Step-by-Step

  1. Start with a minimal LlamaIndex setup that uses OpenAI under the hood. The important part is to keep the app small enough that you can see exactly where the cost data comes from.
import { Document, VectorStoreIndex } from "llamaindex";

async function main() {
  const documents = [
    new Document({ text: "LlamaIndex helps connect data to LLM applications." }),
    new Document({ text: "Cost tracking is useful when monitoring production usage." }),
  ];

  const index = await VectorStoreIndex.fromDocuments(documents);
  const queryEngine = index.asQueryEngine();

  const response = await queryEngine.query({
    query: "Why track cost in an LLM app?",
  });

  console.log(response.toString());
}

main().catch(console.error);
  1. Add a callback handler that listens for LLM events and captures token usage. In TypeScript, this is the cleanest way to get request-level accounting without wrapping every call manually.
import {
  CallbackManager,
  Document,
  EventType,
  Settings,
  TokenUsageEvent,
  VectorStoreIndex,
} from "llamaindex";

class CostTracker {
  totalPromptTokens = 0;
  totalCompletionTokens = 0;
  totalCostUsd = 0;

  add(usage: { promptTokens?: number; completionTokens?: number }) {
    this.totalPromptTokens += usage.promptTokens ?? 0;
    this.totalCompletionTokens += usage.completionTokens ?? 0;
  }
}

const tracker = new CostTracker();

Settings.callbackManager = new CallbackManager();
Settings.callbackManager.on(EventType.LLM_END, (event: TokenUsageEvent) => {
  tracker.add(event.payload.tokenUsage);
});
  1. Define your pricing model explicitly. Don’t hardcode a mystery number into business logic; keep the pricing table close to the tracker so you can update it when model pricing changes.
function estimateCostUsd(promptTokens: number, completionTokens: number) {
  const promptRatePer1k = 0.00015; // example rate
  const completionRatePer1k = 0.0006; // example rate

  return (
    (promptTokens / 1000) * promptRatePer1k +
    (completionTokens / 1000) * completionRatePer1k
  );
}

function printCostSummary(promptTokens: number, completionTokens: number) {
  const cost = estimateCostUsd(promptTokens, completionTokens);
  console.log("---- Cost Summary ----");
  console.log(`Prompt tokens: ${promptTokens}`);
  console.log(`Completion tokens: ${completionTokens}`);
  console.log(`Estimated cost: $${cost.toFixed(6)}`);
}
  1. Wire the tracker into your actual query flow and print the totals after each request. This gives you immediate feedback during development and creates a pattern you can reuse in API handlers later.
import { Document, VectorStoreIndex } from "llamaindex";

async function main() {
  const documents = [
    new Document({ text: "LlamaIndex helps connect data to LLM applications." }),
    new Document({ text: "Cost tracking is useful when monitoring production usage." }),
  ];

  const index = await VectorStoreIndex.fromDocuments(documents);
  const queryEngine = index.asQueryEngine();

  const response = await queryEngine.query({
    query: "Why track cost in an LLM app?",
  });

  console.log(response.toString());

}

main().catch(console.error);
  1. If you want per-request totals in a real service, reset the counters before each operation and read them after it finishes. That keeps one user’s request from contaminating another user’s metrics.
async function runTrackedQuery(queryEngine: ReturnType<VectorStoreIndex["asQueryEngine"]>) {
tracker.totalPromptTokens = 
tracker.totalCompletionTokens =
tracker.totalCostUsd = 
0;

const response = await queryEngine.query({
    query: "Why track cost in an LLM app?",
});

tracker.totalCostUsd =
estimateCostUsd(tracker.totalPromptTokens, tracker.totalCompletionTokens);

console.log(response.toString());
printCostSummary(tracker.totalPromptTokens, tracker.totalCompletionTokens);
}

Testing It

Run the script and confirm you get both an answer and a cost summary in the terminal. If token counts stay at zero, your callback hook isn’t attached correctly or your model/provider isn’t emitting token usage in the event payload.

For a real sanity check, ask two different prompts: one short and one long. The longer prompt should usually produce higher token counts and a higher estimated cost.

If you’re using this in an API server, log prompt_tokens, completion_tokens, estimated_cost_usd, and request_id together. That makes it easy to trace expensive requests later.

Next Steps

  • Add request IDs and write cost metrics to your database or observability stack
  • Track costs by tenant, endpoint, or user role instead of only per request
  • Extend this pattern to embeddings and retrieval calls so you capture full pipeline spend

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides