LangGraph Tutorial (Python): adding cost tracking for advanced developers
This tutorial shows how to add per-run cost tracking to a LangGraph workflow in Python using real model usage metadata and a small callback handler. You need this when you want to attribute LLM spend per agent run, per user request, or per workflow node instead of guessing from logs after the fact.
What You'll Need
- •Python 3.10+
- •
langgraph - •
langchain-openai - •
langchain-core - •An OpenAI API key in
OPENAI_API_KEY - •Optional: a Postgres or SQLite sink if you want to persist costs later
- •A working LangGraph graph with at least one LLM node
Step-by-Step
- •Start with a normal LangGraph setup and make sure your model returns usage metadata. For OpenAI models through LangChain, the response object includes token usage that we can convert into cost.
import os
from typing import TypedDict, Annotated
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
os.environ["OPENAI_API_KEY"] = os.environ["OPENAI_API_KEY"]
class State(TypedDict):
messages: Annotated[list, add_messages]
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
def assistant(state: State):
response = llm.invoke(state["messages"])
return {"messages": [response]}
graph = StateGraph(State)
graph.add_node("assistant", assistant)
graph.add_edge(START, "assistant")
graph.add_edge("assistant", END)
app = graph.compile()
- •Add a small pricing table and a helper that converts token counts into dollars. Keep this in code first; move it to config or a database later if you support multiple providers or changing rates.
from decimal import Decimal
PRICING = {
"gpt-4o-mini": {
"input_per_1m": Decimal("0.15"),
"output_per_1m": Decimal("0.60"),
}
}
def estimate_cost(model_name: str, input_tokens: int, output_tokens: int) -> Decimal:
pricing = PRICING[model_name]
input_cost = (Decimal(input_tokens) / Decimal(1_000_000)) * pricing["input_per_1m"]
output_cost = (Decimal(output_tokens) / Decimal(1_000_000)) * pricing["output_per_1m"]
return input_cost + output_cost
- •Wrap the graph execution so you can inspect the final AI message and calculate cost from usage metadata. This is the simplest production-friendly pattern when you only need run-level accounting.
def run_with_cost(messages):
result = app.invoke({"messages": messages})
last_message = result["messages"][-1]
usage = getattr(last_message, "usage_metadata", None) or {}
input_tokens = usage.get("input_tokens", 0)
output_tokens = usage.get("output_tokens", 0)
cost = estimate_cost(
model_name="gpt-4o-mini",
input_tokens=input_tokens,
output_tokens=output_tokens,
)
return {
"result": result,
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"cost_usd": float(cost),
}
- •If you want node-level tracking instead of just run-level tracking, attach metadata inside each node and emit structured logs. This is what you want when one graph has retrieval, tool use, and multiple model calls.
import json
from datetime import datetime, timezone
def log_usage(node_name: str, model_name: str, message):
usage = getattr(message, "usage_metadata", None) or {}
input_tokens = usage.get("input_tokens", 0)
output_tokens = usage.get("output_tokens", 0)
cost = estimate_cost(model_name, input_tokens, output_tokens)
event = {
"ts": datetime.now(timezone.utc).isoformat(),
"node": node_name,
"model": model_name,
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"cost_usd": float(cost),
}
print(json.dumps(event))
def assistant(state: State):
response = llm.invoke(state["messages"])
log_usage("assistant", "gpt-4o-mini", response)
return {"messages": [response]}
- •Run the graph with a real prompt and verify both the answer and the spend estimate come back together. Use a short prompt first so you can sanity-check the numbers before wiring this into billing or observability.
if __name__ == "__main__":
from langchain_core.messages import HumanMessage
output = run_with_cost([
HumanMessage(content="Write a one-sentence summary of LangGraph.")
])
print(output["result"]["messages"][-1].content)
print(f'Input tokens: {output["input_tokens"]}')
print(f'Output tokens: {output["output_tokens"]}')
print(f'Estimated cost: ${output["cost_usd"]:.6f}')
Testing It
Run the script once with a short prompt and once with a longer prompt. The longer prompt should produce higher token counts and a higher estimated cost.
Check that usage_metadata is present on the final AI message; if it is missing, your model/provider combination may not be returning token usage in the expected format. Also confirm that your pricing table matches the exact model you are calling, because using the wrong rate makes the numbers useless fast.
If you are sending this into production logs, verify that each run has a stable request ID so you can join cost events back to user sessions or downstream business records. That matters more than pretty console output.
Next Steps
- •Persist cost events to Postgres with
run_id,node,model, andcost_usd - •Add callbacks for tracing every LLM/tool call across multi-node graphs
- •Build per-customer budgets and alerts on top of these metrics
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit