Haystack Tutorial (Python): adding cost tracking for advanced developers

By Cyprian AaronsUpdated 2026-04-21
haystackadding-cost-tracking-for-advanced-developerspython

This tutorial shows how to add per-call cost tracking to a Haystack pipeline in Python, using real component outputs and a small accounting layer around your LLM calls. You need this when you want to measure spend by pipeline, tenant, feature flag, or request ID instead of guessing from vendor dashboards after the fact.

What You'll Need

  • Python 3.10+
  • haystack-ai
  • openai
  • An OpenAI API key set as OPENAI_API_KEY
  • Optional but useful:
    • python-dotenv for local env loading
    • pydantic if you want stricter cost event models
  • A basic Haystack pipeline that already calls an LLM generator

Step-by-Step

  1. Start with a normal Haystack pipeline and capture the model metadata you’ll need for billing.
    For cost tracking, the important part is not just the answer text, but the model name and token usage returned by the generator.
import os
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

template = """
Answer the question briefly.

Question: {{question}}
"""

prompt_builder = PromptBuilder(template=template)
llm = OpenAIGenerator(model="gpt-4o-mini")

pipeline = Pipeline()
pipeline.add_component("prompt_builder", prompt_builder)
pipeline.add_component("llm", llm)
pipeline.connect("prompt_builder.prompt", "llm.prompt")

result = pipeline.run({"prompt_builder": {"question": "What is Haystack?"}})
print(result["llm"]["replies"][0])
print(result["llm"]["meta"])
  1. Define a cost table and a small calculator that turns token usage into dollars.
    Keep this outside your pipeline so you can update pricing without touching business logic.
from dataclasses import dataclass

PRICE_PER_1M_TOKENS = {
    "gpt-4o-mini": {"input": 0.15, "output": 0.60},
    "gpt-4o": {"input": 5.00, "output": 15.00},
}

@dataclass
class CostRecord:
    model: str
    input_tokens: int
    output_tokens: int
    total_cost_usd: float

def calculate_cost(model: str, input_tokens: int, output_tokens: int) -> CostRecord:
    pricing = PRICE_PER_1M_TOKENS[model]
    total = (input_tokens / 1_000_000) * pricing["input"] + (output_tokens / 1_000_000) * pricing["output"]
    return CostRecord(model=model, input_tokens=input_tokens, output_tokens=output_tokens, total_cost_usd=total)
  1. Wrap the pipeline call in a tracker that extracts usage from Haystack’s generator metadata.
    In practice, this is where you attach request IDs, user IDs, or tenant IDs before writing records to Postgres, Kafka, or your observability stack.
import uuid
from typing import Any

def extract_usage(meta: dict[str, Any]) -> tuple[str, int, int]:
    model = meta.get("model", "gpt-4o-mini")
    usage = meta.get("usage", {})
    input_tokens = int(usage.get("prompt_tokens", 0))
    output_tokens = int(usage.get("completion_tokens", 0))
    return model, input_tokens, output_tokens

request_id = str(uuid.uuid4())
result = pipeline.run({"prompt_builder": {"question": "Explain retrieval augmented generation in one sentence."}})

meta = result["llm"]["meta"]
model, input_tokens, output_tokens = extract_usage(meta)
cost = calculate_cost(model=model, input_tokens=input_tokens, output_tokens=output_tokens)

print({"request_id": request_id})
print(cost)
  1. Persist each call as a structured event so you can aggregate later by service or tenant.
    For production systems, write these events asynchronously; for now, JSON lines is enough to prove the pattern end to end.
import json
from datetime import datetime, timezone

def build_cost_event(request_id: str, route: str, cost_record: CostRecord) -> dict[str, Any]:
    return {
        "ts": datetime.now(timezone.utc).isoformat(),
        "request_id": request_id,
        "route": route,
        "model": cost_record.model,
        "input_tokens": cost_record.input_tokens,
        "output_tokens": cost_record.output_tokens,
        "cost_usd": round(cost_record.total_cost_usd, 8),
    }

event = build_cost_event(
    request_id=request_id,
    route="qa.answer",
    cost_record=cost,
)

with open("cost-events.jsonl", "a", encoding="utf-8") as f:
    f.write(json.dumps(event) + "\n")

print(event)
  1. If you want stronger control across multiple generators, centralize tracking in one helper function.
    This keeps the accounting logic consistent whether you call one LLM or fan out across several branches in a larger Haystack graph.
def run_with_cost_tracking(question: str) -> tuple[str, dict[str, Any]]:
    req_id = str(uuid.uuid4())
    out = pipeline.run({"prompt_builder": {"question": question}})
    meta = out["llm"]["meta"]
    model_name, prompt_toks, completion_toks = extract_usage(meta)
    record = calculate_cost(model_name, prompt_toks, completion_toks)
    event_data = build_cost_event(req_id, "qa.answer", record)
    return out["llm"]["replies"][0], event_data

answer_text, cost_event = run_with_cost_tracking("What does a retriever do?")
print(answer_text)
print(cost_event)

Testing It

Run the script against a known prompt and confirm two things: you get a normal LLM answer and you get non-zero token counts in the metadata when the provider returns usage data. Then inspect cost-events.jsonl and verify each line contains a timestamp, request ID, model name, token counts, and computed USD cost.

If input_tokens and output_tokens stay at zero on your account or SDK version, check the exact shape of result["llm"]["meta"]. Some providers expose usage under slightly different keys depending on API version or response format.

For production validation:

  • Compare your computed totals with vendor billing exports over a sample window.
  • Run the same prompt twice and confirm costs are stable within expected token variance.
  • Add unit tests for calculate_cost() so pricing changes don’t break accounting.

Next Steps

  • Add tenant-level aggregation with Postgres or ClickHouse.
  • Emit cost events to OpenTelemetry so they show up beside latency traces.
  • Extend this pattern to multi-step pipelines with retrievers and rerankers so you can track full request cost end to end.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides