LangChain Tutorial (Python): adding cost tracking for advanced developers

By Cyprian AaronsUpdated 2026-04-21

langchainadding-cost-tracking-for-advanced-developerspython

This tutorial shows how to add per-run cost tracking to a LangChain Python app using real callback hooks and a small pricing layer. You need this when you want visibility into model spend by request, user, tenant, or workflow step instead of guessing from aggregate provider bills.

What You'll Need

•Python 3.10+
•langchain
•langchain-openai
•openai
•An OpenAI API key in OPENAI_API_KEY
•Optional: python-dotenv if you want to load env vars from a .env file
•A basic LangChain chain or agent already working

Step-by-Step

•Install the packages and set your API key. If you already have a LangChain app, keep that code and just add the tracking pieces below.

pip install langchain langchain-openai openai python-dotenv
export OPENAI_API_KEY="your-key-here"

•Create a callback handler that records token usage and estimates cost. For production, I prefer keeping pricing in one place so updates to model rates don’t leak across the codebase.

from dataclasses import dataclass, field
from typing import Dict, Any
from langchain_core.callbacks import BaseCallbackHandler

MODEL_PRICING = {
    "gpt-4o-mini": {"input": 0.15 / 1_000_000, "output": 0.60 / 1_000_000},
    "gpt-4o": {"input": 5.00 / 1_000_000, "output": 15.00 / 1_000_000},
}

@dataclass
class CostTracker(BaseCallbackHandler):
    totals: Dict[str, float] = field(default_factory=lambda: {
        "prompt_tokens": 0,
        "completion_tokens": 0,
        "total_tokens": 0,
        "cost_usd": 0.0,
    })

    def on_llm_end(self, response, **kwargs: Any) -> None:
        llm_output = getattr(response, "llm_output", {}) or {}
        token_usage = llm_output.get("token_usage", {}) or {}
        model_name = llm_output.get("model_name", "gpt-4o-mini")
        pricing = MODEL_PRICING.get(model_name, MODEL_PRICING["gpt-4o-mini"])

        prompt_tokens = token_usage.get("prompt_tokens", 0)
        completion_tokens = token_usage.get("completion_tokens", 0)
        total_tokens = token_usage.get("total_tokens", prompt_tokens + completion_tokens)

        self.totals["prompt_tokens"] += prompt_tokens
        self.totals["completion_tokens"] += completion_tokens
        self.totals["total_tokens"] += total_tokens
        self.totals["cost_usd"] += (
            prompt_tokens * pricing["input"] +
            completion_tokens * pricing["output"]
        )

•Wire the tracker into a real LangChain chain. This uses ChatOpenAI, ChatPromptTemplate, and StrOutputParser, which is the cleanest way to attach callbacks without polluting business logic.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

tracker = CostTracker()

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise assistant."),
    ("user", "{question}")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt | llm | StrOutputParser()

answer = chain.invoke(
    {"question": "Explain why cost tracking matters in agent workflows."},
    config={"callbacks": [tracker]}
)

print(answer)
print(tracker.totals)

•If you need costs per request instead of global totals, create a fresh tracker for each invocation and attach your own metadata. That gives you clean accounting by customer, endpoint, trace ID, or internal workflow step.

def run_with_cost(question: str) -> dict:
    tracker = CostTracker()
    result = chain.invoke(
        {"question": question},
        config={
            "callbacks": [tracker],
            "metadata": {"tenant_id": "acme-bank", "workflow": "support-assistant"},
        },
    )
    return {
        "answer": result,
        "usage": tracker.totals,
    }

report = run_with_cost("Summarize the risk of uncontrolled tool calls.")
print(report["answer"])
print(report["usage"])

•For multi-step chains or agents, keep the same callback attached at the top level so every LLM call rolls into one accounting object. That is the difference between tracking one prompt and tracking an entire workflow with retries, tool use, and follow-up generations.

from langchain_core.prompts import PromptTemplate

summary_prompt = PromptTemplate.from_template(
    "Write a one-paragraph summary of this text:\n\n{text}"
)

summary_chain = summary_prompt | llm | StrOutputParser()

texts = [
    "Call center notes indicate repeated password reset requests.",
    "Customer asked about transaction dispute timelines.",
]

tracker = CostTracker()
for text in texts:
    output = summary_chain.invoke({"text": text}, config={"callbacks": [tracker]})
    print(output)

print("Aggregated usage:", tracker.totals)

Testing It

Run the script and confirm you get both an answer and a populated totals dictionary after each call. The key fields to check are prompt_tokens, completion_tokens, total_tokens, and cost_usd.

If cost_usd stays at zero, inspect whether your model name matches an entry in MODEL_PRICING. Also verify that the model response includes token usage; some providers or wrappers expose it differently.

For a quick sanity check, compare two prompts of different lengths. The longer prompt should produce higher prompt token counts and usually a higher estimated cost.

Next Steps

•Add persistent storage for usage records in Postgres or DynamoDB.
•Emit cost events to OpenTelemetry or your existing tracing stack.
•Extend the pricing map to support multiple providers and model versions.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit