How to Integrate LangGraph for pension funds with LangSmith for production AI

By Cyprian AaronsUpdated 2026-04-22
langgraph-for-pension-fundslangsmithproduction-ai

Opening

If you’re building AI agents for pension fund operations, you need more than a chat loop. LangGraph gives you stateful orchestration for workflows like benefit eligibility checks, contribution exceptions, and retirement case triage, while LangSmith gives you traceability, evaluation, and debugging in production.

Combined, they let you run agentic workflows with audit-friendly traces, failure analysis, and regression testing. That matters when your system touches regulated member data and every decision path needs to be inspectable.

Prerequisites

  • Python 3.10+
  • A LangChain/LangGraph-compatible environment
  • Access to LangSmith with an API key
  • A working OpenAI or other model provider key
  • langgraph, langsmith, langchain, and a model SDK installed
  • Environment variables configured:
    • LANGSMITH_API_KEY
    • LANGSMITH_TRACING=true
    • LANGSMITH_PROJECT=pension-fund-agent
    • OPENAI_API_KEY

Install the packages:

pip install langgraph langsmith langchain-openai

Integration Steps

  1. Set up LangSmith tracing before you build the graph.

LangSmith works best when tracing is enabled at process startup. This ensures every node execution in your graph gets captured.

import os

os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_PROJECT"] = "pension-fund-agent"
os.environ["LANGSMITH_API_KEY"] = os.getenv("LANGSMITH_API_KEY")
  1. Define the graph state and nodes in LangGraph.

For a pension fund workflow, keep the state explicit. Here we track the member query, the model response, and a decision flag for routing.

from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI

class PensionState(TypedDict):
    query: str
    answer: str
    needs_human_review: bool

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def assess_query(state: PensionState) -> PensionState:
    prompt = (
        "You are a pension fund assistant. "
        "Classify whether this query needs human review due to policy ambiguity, "
        "benefit disputes, or missing data.\n\n"
        f"Query: {state['query']}"
    )
    result = llm.invoke(prompt)
    text = result.content.lower()

    return {
        **state,
        "answer": result.content,
        "needs_human_review": any(
            phrase in text for phrase in ["human review", "ambiguous", "missing data", "cannot determine"]
        ),
    }
  1. Build the workflow graph and compile it.

This is where LangGraph becomes useful for pension operations. You can route low-risk questions to automation and send high-risk cases to review queues.

def route_case(state: PensionState) -> str:
    return "human_review" if state["needs_human_review"] else "done"

def human_review(state: PensionState) -> PensionState:
    return {
        **state,
        "answer": f"[HUMAN REVIEW REQUIRED] {state['answer']}",
    }

graph = StateGraph(PensionState)
graph.add_node("assess_query", assess_query)
graph.add_node("human_review", human_review)

graph.set_entry_point("assess_query")
graph.add_conditional_edges(
    "assess_query",
    route_case,
    {
        "human_review": "human_review",
        "done": END,
    },
)
graph.add_edge("human_review", END)

app = graph.compile()
  1. Attach LangSmith tracing through runtime execution.

If tracing is enabled via environment variables, every .invoke() call on the compiled graph gets traced automatically through LangChain instrumentation. You can also wrap custom spans with Client when you want explicit logging around business events like claim escalation or benefit recalculation.

from langsmith import Client

client = Client()

run_metadata = {
    "system": "pension-fund-agent",
    "workflow": "member-query-triage",
}

result = app.invoke(
    {"query": "Can I withdraw my pension early if I moved abroad?", 
     "answer": "",
     "needs_human_review": False},
    config={
        "metadata": run_metadata,
        "tags": ["pension", "triage", "production"],
    },
)

client.create_run(
    name="pension-triage-request",
    run_type="chain",
    inputs={"query": result["query"]},
    outputs={"answer": result["answer"]},
)
  1. Add evaluation hooks for production regression testing.

LangSmith’s real value shows up when you test prompts and graph behavior against known pension scenarios. Store representative cases and compare outputs across versions before deploying changes.

test_cases = [
    {"query": "What happens to my pension if I retire at 60?", 
     "answer": "", 
     "needs_human_review": False},
    {"query": "My employer says contributions were deducted but not received.", 
     "answer": "", 
     "needs_human_review": False},
]

for case in test_cases:
    output = app.invoke(case, config={"tags": ["eval", "pension"]})
    print(output["answer"])

Testing the Integration

Run a simple smoke test with one low-risk query and one ambiguous query. You should see different routing behavior, and the run should appear in LangSmith under your project.

low_risk = app.invoke({
    "query": "How do I update my beneficiary details?",
    "answer": "",
    "needs_human_review": False,
}, config={"tags": ["smoke-test"]})

high_risk = app.invoke({
    "query": "I think my defined benefit calculation is wrong and I want compensation.",
    "answer": "",
    "needs_human_review": False,
}, config={"tags": ["smoke-test"]})

print("LOW RISK:", low_risk["answer"])
print("HIGH RISK:", high_risk["answer"])

Expected output:

LOW RISK: <model answer about beneficiary updates>
HIGH RISK: [HUMAN REVIEW REQUIRED] <model answer indicating ambiguity or escalation>

In LangSmith, verify:

  • A project named pension-fund-agent
  • Traces for each .invoke() call
  • Metadata tags like pension, triage, or smoke-test
  • Inputs/outputs visible per run

Real-World Use Cases

  • Member query triage

    • Route routine questions like contribution dates or statement access to automation.
    • Escalate disputes, missing records, or policy exceptions to humans with full trace context.
  • Benefit eligibility workflows

    • Chain steps for age checks, vesting status, contribution history, and plan rules.
    • Use LangSmith traces to debug where eligibility decisions diverge from expected outcomes.
  • Compliance-safe agent operations

    • Record every branch taken by the agent for audit review.
    • Run regression tests on prompts whenever plan rules or policy language changes.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides