Haystack Tutorial (Python): adding cost tracking for intermediate developers
This tutorial shows you how to add token-based cost tracking to a Haystack pipeline in Python, so every LLM call can be measured and reported with real numbers. You need this when you want per-request visibility for billing, model selection, or just to stop surprise OpenAI invoices.
What You'll Need
- •Python 3.10+
- •
haystack-ai - •An LLM provider package, such as:
- •
openai
- •
- •An API key for your model provider
- •A Haystack pipeline with at least one generator component
- •Basic familiarity with
Pipeline, components, and prompt builders
Install the packages:
pip install haystack-ai openai
Set your API key:
export OPENAI_API_KEY="your-key-here"
Step-by-Step
- •Start with a simple Haystack pipeline that uses an LLM generator. We’ll keep it minimal so the cost tracking code is easy to see.
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
prompt_builder = PromptBuilder(
template="Answer the question briefly: {{ question }}"
)
llm = OpenAIChatGenerator(model="gpt-4o-mini")
pipe = Pipeline()
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("prompt_builder.prompt", "llm.messages")
- •Run the pipeline once and inspect the raw response metadata. Haystack returns usage information from supported providers, and that is what we’ll convert into cost.
result = pipe.run(
{
"prompt_builder": {
"question": "What is the capital of France?"
}
}
)
message = result["llm"]["replies"][0]
print(message.content)
print(message.meta)
- •Add a small utility that calculates cost from token usage. This keeps pricing logic outside your pipeline components, which is where it belongs in production code.
from dataclasses import dataclass
@dataclass(frozen=True)
class TokenPricing:
input_per_1k: float
output_per_1k: float
def calculate_cost(usage: dict, pricing: TokenPricing) -> float:
input_tokens = usage.get("prompt_tokens", 0)
output_tokens = usage.get("completion_tokens", 0)
input_cost = (input_tokens / 1000) * pricing.input_per_1k
output_cost = (output_tokens / 1000) * pricing.output_per_1k
return round(input_cost + output_cost, 6)
- •Pull usage out of the Haystack response and print a cost report. For OpenAI models, usage usually appears in the message metadata under a provider-specific key structure, so inspect it once and adapt if needed.
pricing = TokenPricing(
input_per_1k=0.15,
output_per_1k=0.60,
)
meta = message.meta or {}
usage = meta.get("usage", {})
cost = calculate_cost(usage, pricing)
print("Prompt tokens:", usage.get("prompt_tokens", 0))
print("Completion tokens:", usage.get("completion_tokens", 0))
print("Estimated cost: $", cost)
- •Wrap the whole thing in a reusable function so every request returns both text and cost. This is the version you actually want in an application service layer.
def ask_with_cost(question: str) -> dict:
result = pipe.run(
{"prompt_builder": {"question": question}}
)
reply = result["llm"]["replies"][0]
usage = (reply.meta or {}).get("usage", {})
total_cost = calculate_cost(usage, pricing)
return {
"answer": reply.content,
"usage": usage,
"estimated_cost_usd": total_cost,
}
output = ask_with_cost("Explain zero trust in one sentence.")
print(output["answer"])
print(output["estimated_cost_usd"])
- •If you want this in logs or metrics, emit a structured record instead of printing it. That makes it easy to ship to CloudWatch, Datadog, ELK, or whatever your team already uses.
import json
record = {
"model": "gpt-4o-mini",
"question": "Explain zero trust in one sentence.",
"usage": output["usage"],
"estimated_cost_usd": output["estimated_cost_usd"],
}
print(json.dumps(record, indent=2))
Testing It
Run the script against a short prompt first and confirm you get both an answer and non-zero token counts. If usage comes back empty, print message.meta directly and check how your provider structures token data.
Then test with a longer prompt so token counts change in a predictable way. Your estimated cost should increase as prompt length increases or when you ask for longer completions.
If you plan to use this in production, compare your estimate against the provider dashboard for a few requests. The numbers won’t always match perfectly because of rounding or provider-side accounting differences, but they should be close enough for internal tracking.
Next Steps
- •Add the same cost wrapper around retrieval + generation pipelines so you can track end-to-end request spend.
- •Store per-request cost records in Postgres or your observability stack for monthly reporting.
- •Extend the pricing table to support multiple models and route requests based on budget thresholds.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit