How to Integrate LangGraph for insurance with Kubernetes for production AI
Combining LangGraph for insurance with Kubernetes gives you a clean path from agent logic to production runtime. LangGraph handles the policy workflow, claim triage, and decision branching; Kubernetes gives you scaling, rollout control, and isolation for regulated workloads.
That combination is useful when you need insurance agents that can inspect claims, route cases, call underwriting rules, and stay available under load. You get stateful orchestration from LangGraph and containerized execution from Kubernetes without wiring everything by hand.
Prerequisites
- •Python 3.10+
- •A Kubernetes cluster:
- •local:
kind,minikube, ork3d - •production: EKS, GKE, or AKS
- •local:
- •
kubectlconfigured and able to reach your cluster - •Docker installed for building images
- •Access to an LLM provider used by your LangGraph workflow
- •Python packages:
- •
langgraph - •
langchain-openaior your model provider package - •
kubernetes - •
fastapi - •
uvicorn
- •
- •A namespace created in Kubernetes for the agent service
Install the Python dependencies:
pip install langgraph langchain-openai kubernetes fastapi uvicorn
Integration Steps
1) Build the LangGraph insurance workflow
Start with a graph that models a simple insurance claim review flow. The point is to keep business logic in the graph and keep deployment concerns outside it.
from typing import TypedDict, Literal
from langgraph.graph import StateGraph, END
class ClaimState(TypedDict):
claim_id: str
amount: float
notes: str
decision: str
def assess_claim(state: ClaimState) -> ClaimState:
if state["amount"] < 5000:
state["decision"] = "auto_approve"
else:
state["decision"] = "manual_review"
return state
def route_claim(state: ClaimState) -> Literal["approve", "review"]:
return "approve" if state["decision"] == "auto_approve" else "review"
def approve_claim(state: ClaimState) -> ClaimState:
state["notes"] = "Claim approved automatically."
return state
def send_to_adjuster(state: ClaimState) -> ClaimState:
state["notes"] = "Claim routed to adjuster queue."
return state
graph = StateGraph(ClaimState)
graph.add_node("assess", assess_claim)
graph.add_node("approve", approve_claim)
graph.add_node("review", send_to_adjuster)
graph.set_entry_point("assess")
graph.add_conditional_edges("assess", route_claim, {
"approve": "approve",
"review": "review",
})
graph.add_edge("approve", END)
graph.add_edge("review", END)
app = graph.compile()
This gives you a deterministic insurance workflow that can be invoked from an API server running in Kubernetes.
2) Wrap the graph in a service layer
Kubernetes should run a stateless service. Keep graph execution behind an HTTP endpoint so pods can scale horizontally.
from fastapi import FastAPI
from pydantic import BaseModel
class ClaimRequest(BaseModel):
claim_id: str
amount: float
notes: str = ""
api = FastAPI()
@api.post("/claims/review")
def review_claim(req: ClaimRequest):
result = app.invoke({
"claim_id": req.claim_id,
"amount": req.amount,
"notes": req.notes,
"decision": ""
})
return result
Run this locally first:
uvicorn main:api --host 0.0.0.0 --port 8000
The key method here is app.invoke(...), which executes the compiled LangGraph workflow with your claim state.
3) Create a Kubernetes deployment for the agent service
Now package the API into a container and deploy it. Use environment variables for model keys and config; do not bake secrets into the image.
from kubernetes import client, config
config.load_kube_config()
namespace = "insurance-ai"
deployment = client.V1Deployment(
metadata=client.V1ObjectMeta(name="claim-agent"),
spec=client.V1DeploymentSpec(
replicas=2,
selector=client.V1LabelSelector(match_labels={"app": "claim-agent"}),
template=client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(labels={"app": "claim-agent"}),
spec=client.V1PodSpec(
containers=[
client.V1Container(
name="claim-agent",
image="your-registry/claim-agent:latest",
ports=[client.V1ContainerPort(container_port=8000)],
env=[
client.V1EnvVar(name="OPENAI_API_KEY", value_from=client.V1EnvVarSource(
secret_key_ref=client.V1SecretKeySelector(name="openai-secret", key="api_key")
))
]
)
]
)
)
)
)
apps_v1 = client.AppsV1Api()
apps_v1.create_namespaced_deployment(namespace=namespace, body=deployment)
This uses the official Kubernetes Python client methods like config.load_kube_config() and AppsV1Api.create_namespaced_deployment(...).
4) Add a Service and call the workflow through Kubernetes
Expose the pods with a Service so internal callers or an ingress can reach the agent.
service = client.V1Service(
metadata=client.V1ObjectMeta(name="claim-agent-svc"),
spec=client.V1ServiceSpec(
selector={"app": "claim-agent"},
ports=[client.V1ServicePort(port=80, target_port=8000)],
type="ClusterIP"
)
)
core_v1 = client.CoreV1Api()
core_v1.create_namespaced_service(namespace=namespace, body=service)
If you want to validate connectivity from inside the cluster, create a temporary pod or use port-forwarding:
kubectl port-forward svc/claim-agent-svc 8080:80 -n insurance-ai
Then call your LangGraph-backed API:
curl -X POST http://localhost:8080/claims/review \
-H 'Content-Type: application/json' \
-d '{"claim_id":"CLM-10021","amount":3200,"notes":"rear bumper damage"}'
5) Scale and observe the workload
For production AI, you need autoscaling based on traffic or queue depth. Start with CPU-based scaling and add custom metrics later if you have long-running tool calls.
autoscaling = client.AutoscalingV2Api()
hpa_manifest = client.V2HorizontalPodAutoscaler(
metadata=client.V2ObjectMeta(name="claim-agent-hpa"),
spec=client.V2HorizontalPodAutoscalerSpec(
scale_target_ref=client.V2CrossVersionObjectReference(
api_version="apps/v1",
kind="Deployment",
name="claim-agent"
),
min_replicas=2,
max_replicas=10,
metrics=[
client.V2MetricSpec(
type="Resource",
resource=client.V2ResourceMetricSource(
name="cpu",
target=client.V2MetricTarget(type="Utilization", average_utilization=70)
)
)
]
)
)
autoscaling.create_namespaced_horizontal_pod_autoscaler(namespace=namespace, body=hpa_manifest)
Keep LangGraph nodes short and idempotent where possible. In Kubernetes, retries happen; your graph should tolerate them.
Testing the Integration
Use one Python script to verify both sides: graph execution and Kubernetes deployment status.
import requests
from kubernetes import client, config
# Verify API response through the service endpoint.
resp = requests.post(
"http://localhost:8080/claims/review",
json={"claim_id": "CLM-10021", "amount": 3200, "notes": "rear bumper damage"}
)
print(resp.json())
# Verify deployment exists in Kubernetes.
config.load_kube_config()
apps_v1 = client.AppsV1Api()
deployments = apps_v1.list_namespaced_deployment(namespace="insurance-ai")
print([d.metadata.name for d in deployments.items])
Expected output:
{'claim_id': 'CLM-10021', 'amount': 3200.0, 'notes': 'Claim approved automatically.', 'decision': 'auto_approve'}
['claim-agent']
If you see that response plus the deployment name, your LangGraph workflow is running behind Kubernetes correctly.
Real-World Use Cases
- •Claims intake agents that classify submissions, check thresholds, and route complex cases to human adjusters.
- •Underwriting assistants that gather policy data, run rule checks, and produce approval recommendations.
- •Fraud triage pipelines that fan out suspicious claims into separate review paths while keeping the system horizontally scalable.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit