LangGraph vs LangSmith for production AI: Which Should You Use?
LangGraph and LangSmith solve different problems, and mixing them up is how teams waste weeks. LangGraph is the orchestration layer for building stateful agent workflows with nodes, edges, checkpoints, and human-in-the-loop control. LangSmith is the observability and evaluation layer for tracing, debugging, datasets, and regression testing.
For production AI, use LangGraph to run the application and LangSmith to prove it works.
Quick Comparison
| Area | LangGraph | LangSmith |
|---|---|---|
| Learning curve | Higher. You need to think in graphs, state, reducers, and execution flow. | Lower. You can start with traces, datasets, and evals quickly. |
| Performance | Strong for deterministic orchestration and multi-step agent control. You own the runtime behavior. | Not a runtime. It adds visibility and testing, not request execution logic. |
| Ecosystem | Built for agentic workflows in langgraph, often paired with langchain. | Built for observability across langchain, langgraph, and custom apps via tracing APIs. |
| Pricing | Open-source framework; infra cost depends on where you deploy it. | SaaS pricing for tracing, evals, datasets, monitoring features. |
| Best use cases | Stateful agents, approval flows, branching logic, retries, memory, tool routing. | Debugging chains/agents, offline evals, prompt regression tests, production monitoring. |
| Documentation | Good for implementation patterns like StateGraph, add_node, add_edge, compile(). | Good for tracing with @traceable, datasets, experiments, feedback loops. |
When LangGraph Wins
- •
You need real control over execution
If your agent must branch based on business rules, pause for approval, or recover from tool failures deterministically, LangGraph is the right tool. The
StateGraphmodel gives you explicit nodes and edges instead of hoping an LLM “does the right thing.” - •
You are building a multi-step workflow with state
Insurance claims triage, KYC review pipelines, underwriting assistants, and fraud investigation flows all need durable state across steps. LangGraph’s checkpointing and state reducers make this manageable instead of turning your code into a pile of callbacks.
- •
You need human-in-the-loop gates
Production systems in banking and insurance often require escalation before action: send payment? approve claim? freeze account? LangGraph handles these approval points cleanly with graph transitions instead of ad hoc if-statements scattered across services.
- •
You want to separate orchestration from model choice
If your workflow may call GPT-4o today and a smaller internal model tomorrow, keep the logic in LangGraph and swap models at the node level. That gives you portability without rewriting the control plane.
Example pattern:
from langgraph.graph import StateGraph
from typing import TypedDict
class AgentState(TypedDict):
query: str
risk_flag: bool
answer: str
def classify(state: AgentState):
return {"risk_flag": "payment" in state["query"].lower()}
def respond(state: AgentState):
return {"answer": "Route to manual review" if state["risk_flag"] else "Proceed"}
graph = StateGraph(AgentState)
graph.add_node("classify", classify)
graph.add_node("respond", respond)
graph.add_edge("classify", "respond")
graph.set_entry_point("classify")
app = graph.compile()
That is production-shaped code. It is explicit about flow and easy to reason about during incident response.
When LangSmith Wins
- •
You need to debug what happened in production
When an agent fails on one customer but passes on another, traces matter more than opinions. LangSmith gives you end-to-end visibility into prompts, tool calls, latency, token usage, intermediate outputs, and errors.
- •
You are running evals before rollout
If you ship prompts or agent changes without regression tests, you are guessing. LangSmith’s datasets and evaluation workflows let you compare runs against labeled examples so you can catch quality drops before they hit customers.
- •
You want monitoring across many chains or agents
For teams operating multiple assistants across support, underwriting, claims intake, or advisor copilots, centralized tracing becomes non-negotiable. LangSmith lets you inspect behavior consistently instead of spelunking through logs from five services.
- •
Your team needs faster iteration on prompts
Prompt engineering without trace data is blindfolded tuning. With LangSmith’s tracing plus experiments/evals workflow, teams can iterate on prompts and compare outputs with actual evidence.
Typical use:
from langsmith import traceable
@traceable
def route_request(input_text: str):
# your app logic here
return {"decision": "manual_review" if "fraud" in input_text else "auto_approve"}
That kind of instrumentation pays off fast when QA asks why a decision changed between releases.
For production AI Specifically
Use LangGraph as the system of record for execution and LangSmith as the system of record for quality. If you only pick one for a production build that actually has business rules attached to it, pick LangGraph—because runtime control beats pretty dashboards every time.
But do not ship serious AI systems without LangSmith attached somewhere in the loop. The winning stack is simple: LangGraph runs the workflow; LangSmith validates it before and after release.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit