How to Integrate LangGraph for insurance with Kubernetes for production AI

By Cyprian AaronsUpdated 2026-04-21

langgraph-for-insurancekubernetesproduction-ai

Combining LangGraph for insurance with Kubernetes gives you a clean path from agent logic to production runtime. LangGraph handles the policy workflow, claim triage, and decision branching; Kubernetes gives you scaling, rollout control, and isolation for regulated workloads.

That combination is useful when you need insurance agents that can inspect claims, route cases, call underwriting rules, and stay available under load. You get stateful orchestration from LangGraph and containerized execution from Kubernetes without wiring everything by hand.

Prerequisites

•Python 3.10+
•
A Kubernetes cluster:
- •local: kind, minikube, or k3d
- •production: EKS, GKE, or AKS
•kubectl configured and able to reach your cluster
•Docker installed for building images
•Access to an LLM provider used by your LangGraph workflow
•
Python packages:
- •langgraph
- •langchain-openai or your model provider package
- •kubernetes
- •fastapi
- •uvicorn
•A namespace created in Kubernetes for the agent service

Install the Python dependencies:

pip install langgraph langchain-openai kubernetes fastapi uvicorn

Integration Steps

1) Build the LangGraph insurance workflow

Start with a graph that models a simple insurance claim review flow. The point is to keep business logic in the graph and keep deployment concerns outside it.

from typing import TypedDict, Literal
from langgraph.graph import StateGraph, END

class ClaimState(TypedDict):
    claim_id: str
    amount: float
    notes: str
    decision: str

def assess_claim(state: ClaimState) -> ClaimState:
    if state["amount"] < 5000:
        state["decision"] = "auto_approve"
    else:
        state["decision"] = "manual_review"
    return state

def route_claim(state: ClaimState) -> Literal["approve", "review"]:
    return "approve" if state["decision"] == "auto_approve" else "review"

def approve_claim(state: ClaimState) -> ClaimState:
    state["notes"] = "Claim approved automatically."
    return state

def send_to_adjuster(state: ClaimState) -> ClaimState:
    state["notes"] = "Claim routed to adjuster queue."
    return state

graph = StateGraph(ClaimState)
graph.add_node("assess", assess_claim)
graph.add_node("approve", approve_claim)
graph.add_node("review", send_to_adjuster)

graph.set_entry_point("assess")
graph.add_conditional_edges("assess", route_claim, {
    "approve": "approve",
    "review": "review",
})
graph.add_edge("approve", END)
graph.add_edge("review", END)

app = graph.compile()

This gives you a deterministic insurance workflow that can be invoked from an API server running in Kubernetes.

2) Wrap the graph in a service layer

Kubernetes should run a stateless service. Keep graph execution behind an HTTP endpoint so pods can scale horizontally.

from fastapi import FastAPI
from pydantic import BaseModel

class ClaimRequest(BaseModel):
    claim_id: str
    amount: float
    notes: str = ""

api = FastAPI()

@api.post("/claims/review")
def review_claim(req: ClaimRequest):
    result = app.invoke({
        "claim_id": req.claim_id,
        "amount": req.amount,
        "notes": req.notes,
        "decision": ""
    })
    return result

Run this locally first:

uvicorn main:api --host 0.0.0.0 --port 8000

The key method here is app.invoke(...), which executes the compiled LangGraph workflow with your claim state.

3) Create a Kubernetes deployment for the agent service

Now package the API into a container and deploy it. Use environment variables for model keys and config; do not bake secrets into the image.

from kubernetes import client, config

config.load_kube_config()

namespace = "insurance-ai"

deployment = client.V1Deployment(
    metadata=client.V1ObjectMeta(name="claim-agent"),
    spec=client.V1DeploymentSpec(
        replicas=2,
        selector=client.V1LabelSelector(match_labels={"app": "claim-agent"}),
        template=client.V1PodTemplateSpec(
            metadata=client.V1ObjectMeta(labels={"app": "claim-agent"}),
            spec=client.V1PodSpec(
                containers=[
                    client.V1Container(
                        name="claim-agent",
                        image="your-registry/claim-agent:latest",
                        ports=[client.V1ContainerPort(container_port=8000)],
                        env=[
                            client.V1EnvVar(name="OPENAI_API_KEY", value_from=client.V1EnvVarSource(
                                secret_key_ref=client.V1SecretKeySelector(name="openai-secret", key="api_key")
                            ))
                        ]
                    )
                ]
            )
        )
    )
)

apps_v1 = client.AppsV1Api()
apps_v1.create_namespaced_deployment(namespace=namespace, body=deployment)

This uses the official Kubernetes Python client methods like config.load_kube_config() and AppsV1Api.create_namespaced_deployment(...).

4) Add a Service and call the workflow through Kubernetes

Expose the pods with a Service so internal callers or an ingress can reach the agent.

service = client.V1Service(
    metadata=client.V1ObjectMeta(name="claim-agent-svc"),
    spec=client.V1ServiceSpec(
        selector={"app": "claim-agent"},
        ports=[client.V1ServicePort(port=80, target_port=8000)],
        type="ClusterIP"
    )
)

core_v1 = client.CoreV1Api()
core_v1.create_namespaced_service(namespace=namespace, body=service)

If you want to validate connectivity from inside the cluster, create a temporary pod or use port-forwarding:

kubectl port-forward svc/claim-agent-svc 8080:80 -n insurance-ai

Then call your LangGraph-backed API:

curl -X POST http://localhost:8080/claims/review \
  -H 'Content-Type: application/json' \
  -d '{"claim_id":"CLM-10021","amount":3200,"notes":"rear bumper damage"}'

5) Scale and observe the workload

For production AI, you need autoscaling based on traffic or queue depth. Start with CPU-based scaling and add custom metrics later if you have long-running tool calls.

autoscaling = client.AutoscalingV2Api()

hpa_manifest = client.V2HorizontalPodAutoscaler(
    metadata=client.V2ObjectMeta(name="claim-agent-hpa"),
    spec=client.V2HorizontalPodAutoscalerSpec(
        scale_target_ref=client.V2CrossVersionObjectReference(
            api_version="apps/v1",
            kind="Deployment",
            name="claim-agent"
        ),
        min_replicas=2,
        max_replicas=10,
        metrics=[
            client.V2MetricSpec(
                type="Resource",
                resource=client.V2ResourceMetricSource(
                    name="cpu",
                    target=client.V2MetricTarget(type="Utilization", average_utilization=70)
                )
            )
        ]
    )
)

autoscaling.create_namespaced_horizontal_pod_autoscaler(namespace=namespace, body=hpa_manifest)

Keep LangGraph nodes short and idempotent where possible. In Kubernetes, retries happen; your graph should tolerate them.

Testing the Integration

Use one Python script to verify both sides: graph execution and Kubernetes deployment status.

import requests
from kubernetes import client, config

# Verify API response through the service endpoint.
resp = requests.post(
    "http://localhost:8080/claims/review",
    json={"claim_id": "CLM-10021", "amount": 3200, "notes": "rear bumper damage"}
)
print(resp.json())

# Verify deployment exists in Kubernetes.
config.load_kube_config()
apps_v1 = client.AppsV1Api()
deployments = apps_v1.list_namespaced_deployment(namespace="insurance-ai")
print([d.metadata.name for d in deployments.items])

Expected output:

{'claim_id': 'CLM-10021', 'amount': 3200.0, 'notes': 'Claim approved automatically.', 'decision': 'auto_approve'}
['claim-agent']

If you see that response plus the deployment name, your LangGraph workflow is running behind Kubernetes correctly.

Real-World Use Cases

•Claims intake agents that classify submissions, check thresholds, and route complex cases to human adjusters.
•Underwriting assistants that gather policy data, run rule checks, and produce approval recommendations.
•Fraud triage pipelines that fan out suspicious claims into separate review paths while keeping the system horizontally scalable.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit