LangGraph vs LangSmith for production AI: Which Should You Use?

By Cyprian AaronsUpdated 2026-04-21

langgraphlangsmithproduction-ai

LangGraph and LangSmith solve different problems, and mixing them up is how teams waste weeks. LangGraph is the orchestration layer for building stateful agent workflows with nodes, edges, checkpoints, and human-in-the-loop control. LangSmith is the observability and evaluation layer for tracing, debugging, datasets, and regression testing.

For production AI, use LangGraph to run the application and LangSmith to prove it works.

Quick Comparison

Area	LangGraph	LangSmith
Learning curve	Higher. You need to think in graphs, state, reducers, and execution flow.	Lower. You can start with traces, datasets, and evals quickly.
Performance	Strong for deterministic orchestration and multi-step agent control. You own the runtime behavior.	Not a runtime. It adds visibility and testing, not request execution logic.
Ecosystem	Built for agentic workflows in `langgraph`, often paired with `langchain`.	Built for observability across `langchain`, `langgraph`, and custom apps via tracing APIs.
Pricing	Open-source framework; infra cost depends on where you deploy it.	SaaS pricing for tracing, evals, datasets, monitoring features.
Best use cases	Stateful agents, approval flows, branching logic, retries, memory, tool routing.	Debugging chains/agents, offline evals, prompt regression tests, production monitoring.
Documentation	Good for implementation patterns like `StateGraph`, `add_node`, `add_edge`, `compile()`.	Good for tracing with `@traceable`, datasets, experiments, feedback loops.

When LangGraph Wins

•
You need real control over execution

If your agent must branch based on business rules, pause for approval, or recover from tool failures deterministically, LangGraph is the right tool. The StateGraph model gives you explicit nodes and edges instead of hoping an LLM “does the right thing.”
•
You are building a multi-step workflow with state

Insurance claims triage, KYC review pipelines, underwriting assistants, and fraud investigation flows all need durable state across steps. LangGraph’s checkpointing and state reducers make this manageable instead of turning your code into a pile of callbacks.
•
You need human-in-the-loop gates

Production systems in banking and insurance often require escalation before action: send payment? approve claim? freeze account? LangGraph handles these approval points cleanly with graph transitions instead of ad hoc if-statements scattered across services.
•
You want to separate orchestration from model choice

If your workflow may call GPT-4o today and a smaller internal model tomorrow, keep the logic in LangGraph and swap models at the node level. That gives you portability without rewriting the control plane.

Example pattern:

from langgraph.graph import StateGraph
from typing import TypedDict

class AgentState(TypedDict):
    query: str
    risk_flag: bool
    answer: str

def classify(state: AgentState):
    return {"risk_flag": "payment" in state["query"].lower()}

def respond(state: AgentState):
    return {"answer": "Route to manual review" if state["risk_flag"] else "Proceed"}

graph = StateGraph(AgentState)
graph.add_node("classify", classify)
graph.add_node("respond", respond)
graph.add_edge("classify", "respond")
graph.set_entry_point("classify")
app = graph.compile()

That is production-shaped code. It is explicit about flow and easy to reason about during incident response.

When LangSmith Wins

•
You need to debug what happened in production

When an agent fails on one customer but passes on another, traces matter more than opinions. LangSmith gives you end-to-end visibility into prompts, tool calls, latency, token usage, intermediate outputs, and errors.
•
You are running evals before rollout

If you ship prompts or agent changes without regression tests, you are guessing. LangSmith’s datasets and evaluation workflows let you compare runs against labeled examples so you can catch quality drops before they hit customers.
•
You want monitoring across many chains or agents

For teams operating multiple assistants across support, underwriting, claims intake, or advisor copilots, centralized tracing becomes non-negotiable. LangSmith lets you inspect behavior consistently instead of spelunking through logs from five services.
•
Your team needs faster iteration on prompts

Prompt engineering without trace data is blindfolded tuning. With LangSmith’s tracing plus experiments/evals workflow, teams can iterate on prompts and compare outputs with actual evidence.

Typical use:

from langsmith import traceable

@traceable
def route_request(input_text: str):
    # your app logic here
    return {"decision": "manual_review" if "fraud" in input_text else "auto_approve"}

That kind of instrumentation pays off fast when QA asks why a decision changed between releases.

For production AI Specifically

Use LangGraph as the system of record for execution and LangSmith as the system of record for quality. If you only pick one for a production build that actually has business rules attached to it, pick LangGraph—because runtime control beats pretty dashboards every time.

But do not ship serious AI systems without LangSmith attached somewhere in the loop. The winning stack is simple: LangGraph runs the workflow; LangSmith validates it before and after release.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit