LangGraph Tutorial (Python): adding cost tracking for intermediate developers
This tutorial shows how to add per-run cost tracking to a LangGraph app in Python, so you can see what each agent step is spending on model calls. You need this when your graph has multiple nodes, retries, or tool loops and you want a clean audit trail for token usage and dollar cost.
What You'll Need
- •Python 3.10+
- •
langgraph - •
langchain-openai - •
langchain-core - •An OpenAI API key set as
OPENAI_API_KEY - •Basic familiarity with:
- •
StateGraph - •node functions
- •
add_messagesstate handling
- •
Install the packages:
pip install langgraph langchain-openai langchain-core
Step-by-Step
- •Start with a minimal LangGraph that uses message state and an LLM node.
The key thing here is that we’ll keep the graph simple first, then add tracing around the model call so we can measure usage cleanly.
from typing import Annotated, TypedDict
from langchain_core.messages import HumanMessage
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
class State(TypedDict):
messages: Annotated[list, add_messages]
def chatbot(state: State):
return {
"messages": [
llm.invoke(state["messages"])
]
}
- •Create a wrapper that measures token usage and converts it into cost.
This keeps cost logic out of your business nodes and gives you one place to update pricing if your provider changes rates.
import os
from langchain_openai import ChatOpenAI
from langchain_core.messages import AIMessage
llm = ChatOpenAI(model="gpt-4o-mini")
INPUT_COST_PER_1M = 0.15
OUTPUT_COST_PER_1M = 0.60
def invoke_with_cost(messages):
result: AIMessage = llm.invoke(messages)
usage = result.response_metadata.get("token_usage", {})
prompt_tokens = usage.get("prompt_tokens", 0)
completion_tokens = usage.get("completion_tokens", 0)
cost = (
(prompt_tokens / 1_000_000) * INPUT_COST_PER_1M
+ (completion_tokens / 1_000_000) * OUTPUT_COST_PER_1M
)
return result, {
"prompt_tokens": prompt_tokens,
"completion_tokens": completion_tokens,
"cost_usd": round(cost, 8),
}
- •Update the graph state so each node can accumulate its own usage record.
In production, this is better than printing to stdout because you can persist the metadata alongside the final answer.
from typing import Any
class State(TypedDict):
messages: Annotated[list, add_messages]
usage: list[dict[str, Any]]
def tracked_chatbot(state: State):
result, usage = invoke_with_cost(state["messages"])
return {
"messages": [result],
"usage": state.get("usage", []) + [usage],
}
builder = StateGraph(State)
builder.add_node("chatbot", tracked_chatbot)
builder.add_edge(START, "chatbot")
builder.add_edge("chatbot", END)
graph = builder.compile()
- •Run the graph and inspect both the response and the accumulated cost data.
The important part is that every invocation returns structured usage data, so downstream code can log it, store it in a DB table, or attach it to an observability event.
initial_state: State = {
"messages": [HumanMessage(content="Explain LangGraph in one sentence.")],
"usage": [],
}
result = graph.invoke(initial_state)
print(result["messages"][-1].content)
print(result["usage"])
print("Total cost:", sum(item["cost_usd"] for item in result["usage"]))
- •If your graph has multiple LLM nodes, reuse the same wrapper for each one.
That gives you node-level accounting without changing how LangGraph routes between steps.
def summarize_node(state: State):
result, usage = invoke_with_cost(state["messages"])
return {
"messages": [result],
"usage": state.get("usage", []) + [usage],
}
builder = StateGraph(State)
builder.add_node("step_a", tracked_chatbot)
builder.add_node("step_b", summarize_node)
builder.add_edge(START, "step_a")
builder.add_edge("step_a", "step_b")
builder.add_edge("step_b", END)
graph = builder.compile()
Testing It
Run the script with a valid OPENAI_API_KEY and confirm that you get two outputs: the assistant response and a list of usage records. Each record should contain prompt_tokens, completion_tokens, and cost_usd. If you only see empty values, check whether your model/provider returns token metadata in response_metadata; some providers expose usage differently.
A good sanity test is to send a short prompt and then a much longer prompt. The longer prompt should produce higher prompt token counts and a higher total cost. If you have multiple nodes in the graph, verify that each node adds one entry to usage.
Next Steps
- •Add a custom callback handler instead of wrapping
llm.invoke()if you want centralized observability. - •Persist per-run usage into Postgres or BigQuery for finance reporting.
- •Extend the same pattern to tool calls so you can track non-LLM execution costs too.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit