Haystack Tutorial (Python): adding cost tracking for advanced developers
This tutorial shows how to add per-call cost tracking to a Haystack pipeline in Python, using real component outputs and a small accounting layer around your LLM calls. You need this when you want to measure spend by pipeline, tenant, feature flag, or request ID instead of guessing from vendor dashboards after the fact.
What You'll Need
- •Python 3.10+
- •
haystack-ai - •
openai - •An OpenAI API key set as
OPENAI_API_KEY - •Optional but useful:
- •
python-dotenvfor local env loading - •
pydanticif you want stricter cost event models
- •
- •A basic Haystack pipeline that already calls an LLM generator
Step-by-Step
- •Start with a normal Haystack pipeline and capture the model metadata you’ll need for billing.
For cost tracking, the important part is not just the answer text, but the model name and token usage returned by the generator.
import os
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
template = """
Answer the question briefly.
Question: {{question}}
"""
prompt_builder = PromptBuilder(template=template)
llm = OpenAIGenerator(model="gpt-4o-mini")
pipeline = Pipeline()
pipeline.add_component("prompt_builder", prompt_builder)
pipeline.add_component("llm", llm)
pipeline.connect("prompt_builder.prompt", "llm.prompt")
result = pipeline.run({"prompt_builder": {"question": "What is Haystack?"}})
print(result["llm"]["replies"][0])
print(result["llm"]["meta"])
- •Define a cost table and a small calculator that turns token usage into dollars.
Keep this outside your pipeline so you can update pricing without touching business logic.
from dataclasses import dataclass
PRICE_PER_1M_TOKENS = {
"gpt-4o-mini": {"input": 0.15, "output": 0.60},
"gpt-4o": {"input": 5.00, "output": 15.00},
}
@dataclass
class CostRecord:
model: str
input_tokens: int
output_tokens: int
total_cost_usd: float
def calculate_cost(model: str, input_tokens: int, output_tokens: int) -> CostRecord:
pricing = PRICE_PER_1M_TOKENS[model]
total = (input_tokens / 1_000_000) * pricing["input"] + (output_tokens / 1_000_000) * pricing["output"]
return CostRecord(model=model, input_tokens=input_tokens, output_tokens=output_tokens, total_cost_usd=total)
- •Wrap the pipeline call in a tracker that extracts usage from Haystack’s generator metadata.
In practice, this is where you attach request IDs, user IDs, or tenant IDs before writing records to Postgres, Kafka, or your observability stack.
import uuid
from typing import Any
def extract_usage(meta: dict[str, Any]) -> tuple[str, int, int]:
model = meta.get("model", "gpt-4o-mini")
usage = meta.get("usage", {})
input_tokens = int(usage.get("prompt_tokens", 0))
output_tokens = int(usage.get("completion_tokens", 0))
return model, input_tokens, output_tokens
request_id = str(uuid.uuid4())
result = pipeline.run({"prompt_builder": {"question": "Explain retrieval augmented generation in one sentence."}})
meta = result["llm"]["meta"]
model, input_tokens, output_tokens = extract_usage(meta)
cost = calculate_cost(model=model, input_tokens=input_tokens, output_tokens=output_tokens)
print({"request_id": request_id})
print(cost)
- •Persist each call as a structured event so you can aggregate later by service or tenant.
For production systems, write these events asynchronously; for now, JSON lines is enough to prove the pattern end to end.
import json
from datetime import datetime, timezone
def build_cost_event(request_id: str, route: str, cost_record: CostRecord) -> dict[str, Any]:
return {
"ts": datetime.now(timezone.utc).isoformat(),
"request_id": request_id,
"route": route,
"model": cost_record.model,
"input_tokens": cost_record.input_tokens,
"output_tokens": cost_record.output_tokens,
"cost_usd": round(cost_record.total_cost_usd, 8),
}
event = build_cost_event(
request_id=request_id,
route="qa.answer",
cost_record=cost,
)
with open("cost-events.jsonl", "a", encoding="utf-8") as f:
f.write(json.dumps(event) + "\n")
print(event)
- •If you want stronger control across multiple generators, centralize tracking in one helper function.
This keeps the accounting logic consistent whether you call one LLM or fan out across several branches in a larger Haystack graph.
def run_with_cost_tracking(question: str) -> tuple[str, dict[str, Any]]:
req_id = str(uuid.uuid4())
out = pipeline.run({"prompt_builder": {"question": question}})
meta = out["llm"]["meta"]
model_name, prompt_toks, completion_toks = extract_usage(meta)
record = calculate_cost(model_name, prompt_toks, completion_toks)
event_data = build_cost_event(req_id, "qa.answer", record)
return out["llm"]["replies"][0], event_data
answer_text, cost_event = run_with_cost_tracking("What does a retriever do?")
print(answer_text)
print(cost_event)
Testing It
Run the script against a known prompt and confirm two things: you get a normal LLM answer and you get non-zero token counts in the metadata when the provider returns usage data. Then inspect cost-events.jsonl and verify each line contains a timestamp, request ID, model name, token counts, and computed USD cost.
If input_tokens and output_tokens stay at zero on your account or SDK version, check the exact shape of result["llm"]["meta"]. Some providers expose usage under slightly different keys depending on API version or response format.
For production validation:
- •Compare your computed totals with vendor billing exports over a sample window.
- •Run the same prompt twice and confirm costs are stable within expected token variance.
- •Add unit tests for
calculate_cost()so pricing changes don’t break accounting.
Next Steps
- •Add tenant-level aggregation with Postgres or ClickHouse.
- •Emit cost events to OpenTelemetry so they show up beside latency traces.
- •Extend this pattern to multi-step pipelines with retrievers and rerankers so you can track full request cost end to end.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit