LangGraph Tutorial (Python): adding cost tracking for advanced developers

By Cyprian AaronsUpdated 2026-04-22
langgraphadding-cost-tracking-for-advanced-developerspython

This tutorial shows how to add per-run cost tracking to a LangGraph workflow in Python using real model usage metadata and a small callback handler. You need this when you want to attribute LLM spend per agent run, per user request, or per workflow node instead of guessing from logs after the fact.

What You'll Need

  • Python 3.10+
  • langgraph
  • langchain-openai
  • langchain-core
  • An OpenAI API key in OPENAI_API_KEY
  • Optional: a Postgres or SQLite sink if you want to persist costs later
  • A working LangGraph graph with at least one LLM node

Step-by-Step

  1. Start with a normal LangGraph setup and make sure your model returns usage metadata. For OpenAI models through LangChain, the response object includes token usage that we can convert into cost.
import os
from typing import TypedDict, Annotated

from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages

os.environ["OPENAI_API_KEY"] = os.environ["OPENAI_API_KEY"]

class State(TypedDict):
    messages: Annotated[list, add_messages]

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def assistant(state: State):
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

graph = StateGraph(State)
graph.add_node("assistant", assistant)
graph.add_edge(START, "assistant")
graph.add_edge("assistant", END)
app = graph.compile()
  1. Add a small pricing table and a helper that converts token counts into dollars. Keep this in code first; move it to config or a database later if you support multiple providers or changing rates.
from decimal import Decimal

PRICING = {
    "gpt-4o-mini": {
        "input_per_1m": Decimal("0.15"),
        "output_per_1m": Decimal("0.60"),
    }
}

def estimate_cost(model_name: str, input_tokens: int, output_tokens: int) -> Decimal:
    pricing = PRICING[model_name]
    input_cost = (Decimal(input_tokens) / Decimal(1_000_000)) * pricing["input_per_1m"]
    output_cost = (Decimal(output_tokens) / Decimal(1_000_000)) * pricing["output_per_1m"]
    return input_cost + output_cost
  1. Wrap the graph execution so you can inspect the final AI message and calculate cost from usage metadata. This is the simplest production-friendly pattern when you only need run-level accounting.
def run_with_cost(messages):
    result = app.invoke({"messages": messages})
    last_message = result["messages"][-1]

    usage = getattr(last_message, "usage_metadata", None) or {}
    input_tokens = usage.get("input_tokens", 0)
    output_tokens = usage.get("output_tokens", 0)

    cost = estimate_cost(
        model_name="gpt-4o-mini",
        input_tokens=input_tokens,
        output_tokens=output_tokens,
    )

    return {
        "result": result,
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "cost_usd": float(cost),
    }
  1. If you want node-level tracking instead of just run-level tracking, attach metadata inside each node and emit structured logs. This is what you want when one graph has retrieval, tool use, and multiple model calls.
import json
from datetime import datetime, timezone

def log_usage(node_name: str, model_name: str, message):
    usage = getattr(message, "usage_metadata", None) or {}
    input_tokens = usage.get("input_tokens", 0)
    output_tokens = usage.get("output_tokens", 0)
    cost = estimate_cost(model_name, input_tokens, output_tokens)

    event = {
        "ts": datetime.now(timezone.utc).isoformat(),
        "node": node_name,
        "model": model_name,
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "cost_usd": float(cost),
    }
    print(json.dumps(event))

def assistant(state: State):
    response = llm.invoke(state["messages"])
    log_usage("assistant", "gpt-4o-mini", response)
    return {"messages": [response]}
  1. Run the graph with a real prompt and verify both the answer and the spend estimate come back together. Use a short prompt first so you can sanity-check the numbers before wiring this into billing or observability.
if __name__ == "__main__":
    from langchain_core.messages import HumanMessage

    output = run_with_cost([
        HumanMessage(content="Write a one-sentence summary of LangGraph.")
    ])

    print(output["result"]["messages"][-1].content)
    print(f'Input tokens: {output["input_tokens"]}')
    print(f'Output tokens: {output["output_tokens"]}')
    print(f'Estimated cost: ${output["cost_usd"]:.6f}')

Testing It

Run the script once with a short prompt and once with a longer prompt. The longer prompt should produce higher token counts and a higher estimated cost.

Check that usage_metadata is present on the final AI message; if it is missing, your model/provider combination may not be returning token usage in the expected format. Also confirm that your pricing table matches the exact model you are calling, because using the wrong rate makes the numbers useless fast.

If you are sending this into production logs, verify that each run has a stable request ID so you can join cost events back to user sessions or downstream business records. That matters more than pretty console output.

Next Steps

  • Persist cost events to Postgres with run_id, node, model, and cost_usd
  • Add callbacks for tracing every LLM/tool call across multi-node graphs
  • Build per-customer budgets and alerts on top of these metrics

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides