Haystack Tutorial (Python): adding cost tracking for beginners

By Cyprian AaronsUpdated 2026-04-21

haystackadding-cost-tracking-for-beginnerspython

This tutorial shows you how to add token-based cost tracking to a Haystack pipeline in Python using a small callback that records model usage after each run. You need this when you want visibility into LLM spend per request, per user, or per workflow without wiring in a full observability stack.

What You'll Need

•Python 3.10+
•haystack-ai
•An OpenAI API key
•A working internet connection for the model call
•Basic familiarity with Haystack pipelines and components

Install the package:

pip install haystack-ai

Set your API key:

export OPENAI_API_KEY="your-api-key"

Step-by-Step

•Start with a simple Haystack pipeline that calls an OpenAI chat generator. We’ll keep the pipeline small so the cost tracking logic is easy to see.

import os
from haystack import Pipeline, component
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage

@component
class PromptBuilder:
    @component.output_types(messages=list)
    def run(self, question: str):
        return {
            "messages": [
                ChatMessage.from_user(
                    f"Answer briefly: {question}"
                )
            ]
        }

pipeline = Pipeline()
pipeline.add_component("prompt_builder", PromptBuilder())
pipeline.add_component("llm", OpenAIChatGenerator(model="gpt-4o-mini"))
pipeline.connect("prompt_builder.messages", "llm.messages")

•Add a tiny cost tracker that converts tokens into dollars. This example uses a hardcoded price table so beginners can understand the mechanics before pulling rates from a config file or pricing service.

from dataclasses import dataclass

@dataclass
class CostTracker:
    input_cost_per_1k: float = 0.00015   # example rate
    output_cost_per_1k: float = 0.00060  # example rate

    def estimate(self, input_tokens: int, output_tokens: int) -> float:
        return (
            (input_tokens / 1000) * self.input_cost_per_1k +
            (output_tokens / 1000) * self.output_cost_per_1k
        )

tracker = CostTracker()

•Run the pipeline and read usage metadata from the generator output. Haystack returns token usage on supported generators, which is the cleanest place to attach your cost calculation.

result = pipeline.run({
    "prompt_builder": {
        "question": "What is Haystack?"
    }
})

reply = result["llm"]["replies"][0]
usage = result["llm"]["meta"][0]["usage"]

input_tokens = usage["prompt_tokens"]
output_tokens = usage["completion_tokens"]
cost = tracker.estimate(input_tokens, output_tokens)

print("Answer:", reply.content)
print("Input tokens:", input_tokens)
print("Output tokens:", output_tokens)
print(f"Estimated cost: ${cost:.6f}")

•Wrap the logic in a reusable helper so every request gets tracked the same way. In production, this is where you would also log request IDs, user IDs, and model names to your database or metrics backend.

def run_with_cost_tracking(question: str):
    result = pipeline.run({
        "prompt_builder": {"question": question}
    })

    reply = result["llm"]["replies"][0]
    usage = result["llm"]["meta"][0]["usage"]

    input_tokens = usage["prompt_tokens"]
    output_tokens = usage["completion_tokens"]
    cost = tracker.estimate(input_tokens, output_tokens)

    return {
        "answer": reply.content,
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "estimated_cost_usd": round(cost, 6),
    }

report = run_with_cost_tracking("Explain vector databases in one sentence.")
print(report)

•If you want persistent tracking, store each run in a list or database row immediately after execution. For beginners, a JSONL file is enough to prove the pattern before moving to Postgres, ClickHouse, or Prometheus.

import json
from datetime import datetime

def log_run(report: dict, path: str = "usage লগ.jsonl"):
    record = {
        "timestamp": datetime.utcnow().isoformat(),
        **report,
    }
    with open(path, "a", encoding="utf-8") as f:
        f.write(json.dumps(record) + "\n")

report = run_with_cost_tracking("Give me one benefit of RAG.")
log_run(report)
print("Logged:", report["estimated_cost_usd"])

Testing It

Run the script and confirm you get three things back: an answer string, token counts from meta[0]["usage"], and a non-zero estimated dollar value. If usage is missing, check that your generator supports token reporting and that your API call succeeded.

Then inspect the JSONL file and verify each line contains a timestamp plus the same cost fields printed to stdout. If you want stronger validation, run the same prompt twice and compare whether token counts stay stable while cost changes only when model behavior changes.

Next Steps

•Move the price table into configuration so you can update model pricing without code changes.
•Add request metadata like tenant_id, user_id, and trace_id before writing logs.
•Export these numbers to OpenTelemetry or Prometheus once you’re ready for production observability.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit