LangChain Tutorial (Python): adding cost tracking for beginners

By Cyprian AaronsUpdated 2026-04-21

langchainadding-cost-tracking-for-beginnerspython

This tutorial shows you how to add token and cost tracking to a basic LangChain Python app using OpenAI callbacks. You need this when you want visibility into what each chain call costs before you ship an agent, prototype, or internal tool.

What You'll Need

•Python 3.10+
•An OpenAI API key
•
A LangChain project with these packages installed:
- •langchain
- •langchain-openai
- •openai
•Basic familiarity with ChatOpenAI, prompts, and chains
•Environment variable support for OPENAI_API_KEY

Install the dependencies:

pip install langchain langchain-openai openai

Set your API key:

export OPENAI_API_KEY="your-api-key-here"

Step-by-Step

•
Start with a minimal LangChain chain.

This example uses a prompt template and a chat model. Keep it simple first so you can clearly see where the cost tracking hooks in.

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise assistant."),
    ("user", "Explain {topic} in one paragraph.")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt | llm

result = chain.invoke({"topic": "token usage tracking"})
print(result.content)

•
Add callback-based usage tracking.

LangChain exposes token accounting through callbacks. The simplest pattern for beginners is get_openai_callback(), which gives you prompt tokens, completion tokens, total tokens, and estimated cost for the wrapped call.

from langchain.callbacks import get_openai_callback
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise assistant."),
    ("user", "Explain {topic} in one paragraph.")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt | llm

with get_openai_callback() as cb:
    result = chain.invoke({"topic": "token usage tracking"})

print(result.content)
print("Prompt tokens:", cb.prompt_tokens)
print("Completion tokens:", cb.completion_tokens)
print("Total tokens:", cb.total_tokens)
print("Total cost:", cb.total_cost)

•
Wrap multiple calls so you can track totals across a workflow.

In real apps, one request often triggers several LLM calls. Put all of them inside the same callback context if you want one combined cost number for the whole operation.

from langchain.callbacks import get_openai_callback
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

summary_prompt = ChatPromptTemplate.from_messages([
    ("system", "Summarize clearly."),
    ("user", "Summarize this text: {text}")
])

rewrite_prompt = ChatPromptTemplate.from_messages([
    ("system", "Rewrite professionally."),
    ("user", "Rewrite this summary: {summary}")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
summary_chain = summary_prompt | llm
rewrite_chain = rewrite_prompt | llm

with get_openai_callback() as cb:
    summary = summary_chain.invoke({"text": "LangChain can track token usage per run."})
    final_text = rewrite_chain.invoke({"summary": summary.content})

print(final_text.content)
print("Total tokens:", cb.total_tokens)
print("Total cost:", cb.total_cost)

•
Log the numbers in a production-friendly way.

Printing is fine for learning, but real systems should send these values to logs or metrics. A small helper makes it easy to attach cost data to every request without changing your business logic.

import logging
from langchain.callbacks import get_openai_callback
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

logging.basicConfig(level=logging.INFO)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise assistant."),
    ("user", "Give me three bullet points about {topic}.")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt | llm

with get_openai_callback() as cb:
    response = chain.invoke({"topic": "cost tracking"})
    logging.info(
        "llm_usage prompt_tokens=%s completion_tokens=%s total_tokens=%s total_cost=%s",
        cb.prompt_tokens,
        cb.completion_tokens,
        cb.total_tokens,
        cb.total_cost,
    )

print(response.content)

•
Make the output useful for downstream monitoring.

If you're building an API or internal service, return the usage alongside the model output. That gives your frontend, logs, or observability stack something structured to work with.

from langchain.callbacks import get_openai_callback
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise assistant."),
    ("user", "Define {term} in plain English.")
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt | llm

def run_with_cost(term: str) -> dict:
    with get_openai_callback() as cb:
        result = chain.invoke({"term": term})
        return {
            "answer": result.content,
            "usage": {
                "prompt_tokens": cb.prompt_tokens,
                "completion_tokens": cb.completion_tokens,
                "total_tokens": cb.total_tokens,
                "total_cost": cb.total_cost,
            },
        }

output = run_with_cost("temperature")
print(output["answer"])
print(output["usage"])

Testing It

Run the script once and confirm that you get both an answer and non-zero token counts. If total_cost is 0.0, check that you're using an OpenAI model supported by the callback pricing table and that your package versions are current.

Try two different prompts: one short and one long. The longer input should usually increase prompt tokens and total cost, which is the easiest sanity check that tracking is working.

If you're wrapping multiple calls in one callback block, verify that the totals reflect all calls combined. That tells you the callback scope is correct and you're not accidentally measuring only part of the workflow.

Next Steps

•Add usage fields to your FastAPI response models so every LLM request returns cost metadata.
•Push token and cost numbers into Prometheus, Datadog, or CloudWatch for dashboarding.
•Learn LangSmith tracing so you can connect cost data with full run-level observability.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit