Haystack Tutorial (TypeScript): adding cost tracking for advanced developers
This tutorial shows you how to add per-call cost tracking to a Haystack TypeScript pipeline by wrapping model calls and recording token usage. You need this when you want request-level visibility into LLM spend, build chargeback by team or tenant, or enforce budgets before costs drift.
What You'll Need
- •Node.js 18+
- •A TypeScript project with
typescriptandts-nodeor a build step already set up - •Haystack TypeScript packages installed
- •An OpenAI API key in
OPENAI_API_KEY - •A place to persist metrics, such as stdout, Prometheus, Postgres, or your own audit table
- •Basic familiarity with Haystack
Pipeline,PromptBuilder, and generator components
Step-by-Step
- •Start with a normal Haystack pipeline so you have a baseline to measure. The key idea is simple: every model call returns usage metadata, and you capture that at the boundary where the response leaves the generator.
import { Pipeline } from "@haystack/core";
import { PromptBuilder } from "@haystack/components/prompts";
import { OpenAIGenerator } from "@haystack/components/generators/openai";
const promptBuilder = new PromptBuilder({
template: "Answer in one sentence: {{question}}",
});
const llm = new OpenAIGenerator({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-4o-mini",
});
const pipeline = new Pipeline();
pipeline.addComponent("promptBuilder", promptBuilder);
pipeline.addComponent("llm", llm);
pipeline.connect("promptBuilder.prompt", "llm.prompt");
- •Define a small cost calculator that converts token usage into dollars. Keep the rates in one place so you can update them without touching your pipeline code.
type Usage = {
promptTokens?: number;
completionTokens?: number;
totalTokens?: number;
};
const PRICING = {
"gpt-4o-mini": {
inputPer1M: 0.15,
outputPer1M: 0.6,
},
} as const;
function calculateCost(model: keyof typeof PRICING, usage: Usage) {
const pricing = PRICING[model];
const promptTokens = usage.promptTokens ?? 0;
const completionTokens = usage.completionTokens ?? 0;
return (
(promptTokens / 1_000_000) * pricing.inputPer1M +
(completionTokens / 1_000_000) * pricing.outputPer1M
);
}
- •Wrap execution in a helper that extracts usage from the generator result and emits an accounting record. This keeps cost tracking out of business logic and makes it easy to swap stdout for real telemetry later.
async function runWithCostTracking(question: string) {
const result = await pipeline.run({
promptBuilder: { question },
llm: {},
});
const generation = result.llm?.replies?.[0];
const text = generation?.text ?? "";
const usage = generation?.meta?.usage as Usage | undefined;
if (!usage) {
throw new Error("Missing token usage metadata from LLM response");
}
const model = "gpt-4o-mini";
const costUsd = calculateCost(model, usage);
console.log(
JSON.stringify(
{
requestType: "qa",
model,
promptTokens: usage.promptTokens ?? null,
completionTokens: usage.completionTokens ?? null,
totalTokens: usage.totalTokens ?? null,
costUsd,
},
null,
2
)
);
return text;
}
- •Add a thin persistence layer so the metric survives process restarts. In production, this is where you would write to a database table or metrics backend instead of just printing JSON.
type CostRecord = {
requestId: string;
model: string;
promptTokens: number;
completionTokens: number;
totalTokens?: number;
costUsd: number;
};
const costLedger: CostRecord[] = [];
function recordCost(entry: CostRecord) {
costLedger.push(entry);
}
async function trackedQuestion(question: string) {
const result = await pipeline.run({
promptBuilder: { question },
llm: {},
});
const reply = result.llm.replies[0];
const usage = reply.meta.usage as Usage;
recordCost({
requestId: crypto.randomUUID(),
model: "gpt-4o-mini",
promptTokens: usage.promptTokens ?? 0,
completionTokens: usage.completionTokens ?? 0,
totalTokens: usage.totalTokens,
costUsd: calculateCost("gpt-4o-mini", usage),
});
return reply.text;
}
- •Put it behind a small executable entrypoint so you can test the flow end to end. This gives you a single place to validate both the answer and the accounting output.
async function main() {
const answer = await trackedQuestion("What is an insurance deductible?");
console.log({ answer });
console.log({ ledgerSize: costLedger.length });
console.log(costLedger[0]);
}
main().catch((err) => {
console.error(err);
process.exit(1);
});
Testing It
Run the script with a real API key and confirm two things in the output. First, the LLM response should print normally; second, you should see token counts and a non-zero costUsd value in your ledger entry.
If usage is missing, check that your generator returns metadata in the shape your installed Haystack version expects. Different component versions may expose reply metadata slightly differently, so inspect result.llm.replies[0] once before wiring this into production.
For production validation, compare the logged token counts against your provider dashboard for the same request window. They will not always match perfectly on a single call due to rounding and provider-side reporting delays, but they should be close enough to catch regressions.
Next Steps
- •Move
costLedgerinto Postgres or ClickHouse and addtenantId,workspaceId, andrequestId - •Add budget guards that reject requests once projected spend exceeds a threshold
- •Extend pricing support for multiple models and cached-input pricing rules
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit