How to Integrate LangGraph for insurance with Kubernetes for multi-agent systems
Combining LangGraph for insurance with Kubernetes gives you a clean way to run multi-agent insurance workflows as real services, not just local scripts. LangGraph handles the agent orchestration and stateful decision flow; Kubernetes gives you scheduling, scaling, and isolation for claims, underwriting, fraud, and policy servicing agents.
This setup is useful when one request needs multiple specialist agents to cooperate, with retries, observability, and deployment control across environments.
Prerequisites
- •Python 3.10+
- •A running Kubernetes cluster
- •Local:
kind,minikube, ork3d - •Remote: EKS, GKE, AKS
- •Local:
- •
kubectlconfigured against your cluster - •Docker installed for building images
- •Access to your LLM provider credentials
- •Python packages:
- •
langgraph - •
langchain-openaior your model provider package - •
kubernetes - •
pydantic
- •
- •Basic familiarity with:
- •LangGraph state graphs
- •Kubernetes Deployments, Services, ConfigMaps, and Secrets
Install the Python dependencies:
pip install langgraph langchain-openai kubernetes pydantic
Integration Steps
- •Define the insurance workflow in LangGraph
Start by modeling the multi-agent flow. For insurance use cases, a common pattern is:
- •intake agent
- •policy lookup agent
- •claims assessment agent
- •escalation agent
Use LangGraph’s StateGraph to route between these nodes.
from typing import TypedDict, Annotated
import operator
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
class InsuranceState(TypedDict):
messages: Annotated[list, operator.add]
claim_type: str
risk_score: float
decision: str
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
def intake_agent(state: InsuranceState):
prompt = f"Classify this insurance request: {state['messages'][-1]}"
response = llm.invoke(prompt)
return {"messages": [response.content], "claim_type": "auto"}
def assessment_agent(state: InsuranceState):
prompt = f"Assess risk for claim type {state['claim_type']}"
response = llm.invoke(prompt)
return {"messages": [response.content], "risk_score": 0.72}
def decision_agent(state: InsuranceState):
if state["risk_score"] > 0.7:
return {"decision": "escalate"}
return {"decision": "approve"}
graph = StateGraph(InsuranceState)
graph.add_node("intake", intake_agent)
graph.add_node("assess", assessment_agent)
graph.add_node("decide", decision_agent)
graph.set_entry_point("intake")
graph.add_edge("intake", "assess")
graph.add_edge("assess", "decide")
graph.add_edge("decide", END)
app = graph.compile()
- •Package the graph as a service for Kubernetes
In production, each graph worker should run in a container. Expose an HTTP endpoint so Kubernetes can scale it horizontally.
from fastapi import FastAPI
from pydantic import BaseModel
app_api = FastAPI()
class ClaimRequest(BaseModel):
message: str
@app_api.post("/run")
def run_claim(req: ClaimRequest):
result = app.invoke({
"messages": [req.message],
"claim_type": "",
"risk_score": 0.0,
"decision": ""
})
return result
Build a Docker image with this app and deploy it to Kubernetes behind a Service.
apiVersion: apps/v1
kind: Deployment
metadata:
name: insurance-agent
spec:
replicas: 2
selector:
matchLabels:
app: insurance-agent
template:
metadata:
labels:
app: insurance-agent
spec:
containers:
- name: app
image: your-registry/insurance-agent:latest
ports:
- containerPort: 8000
- •Use the Kubernetes Python client to manage agent workers
If you want dynamic multi-agent capacity, use the Kubernetes API to inspect pods or trigger jobs for specific workloads like fraud checks or document extraction.
from kubernetes import client, config
config.load_incluster_config() # use config.load_kube_config() locally
v1 = client.CoreV1Api()
pods = v1.list_namespaced_pod(namespace="insurance-ai", label_selector="app=insurance-agent")
for pod in pods.items:
print(pod.metadata.name, pod.status.phase)
You can also create a Job for burst traffic when claim volume spikes.
batch_v1 = client.BatchV1Api()
job_manifest = client.V1Job(
metadata=client.V1ObjectMeta(name="fraud-check-job"),
spec=client.V1JobSpec(
template=client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(labels={"job": "fraud-check"}),
spec=client.V1PodSpec(
restart_policy="Never",
containers=[
client.V1Container(
name="fraud-check",
image="your-registry/fraud-check:latest",
args=["python", "worker.py"]
)
],
),
),
backoff_limit=2,
),
)
batch_v1.create_namespaced_job(namespace="insurance-ai", body=job_manifest)
- •Wire multi-agent routing through Kubernetes service discovery
In a real system, one LangGraph node can call another agent over HTTP using its Kubernetes Service DNS name. That keeps each specialist isolated and independently deployable.
import requests
FRAUD_SERVICE_URL = "http://fraud-agent.insurance-ai.svc.cluster.local/run"
def fraud_agent(state: InsuranceState):
payload = {
"claim_id": state["messages"][-1],
"risk_context": state["claim_type"],
}
resp = requests.post(FRAUD_SERVICE_URL, json=payload, timeout=10)
resp.raise_for_status()
data = resp.json()
return {
"messages": [f"fraud_score={data['score']}"],
"risk_score": data["score"]
}
This pattern works well when one team owns underwriting logic and another owns fraud logic. Kubernetes gives you stable service names; LangGraph gives you deterministic orchestration.
- •Add config and secrets from Kubernetes into your graph runtime
Keep model keys and environment-specific settings in Secrets and ConfigMaps. Load them into the pod as environment variables so your graph code stays portable.
import os
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
KUBE_NAMESPACE = os.environ.get("KUBE_NAMESPACE", "insurance-ai")
MODEL_NAME = os.environ.get("MODEL_NAME", "gpt-4o-mini")
llm = ChatOpenAI(model=MODEL_NAME, api_key=OPENAI_API_KEY)
A typical pod spec injects those values like this:
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: llm-secrets
key: openai_api_key
- name: MODEL_NAME
valueFrom:
configMapKeyRef:
name: app-config,
key: model_name
Testing the Integration
Run the service locally or inside the cluster, then send a claim request through the API.
import requests
resp = requests.post(
"http://localhost:8000/run",
json={"message": "Rear-end collision with moderate damage"}
)
print(resp.status_code)
print(resp.json())
Expected output:
200
{
"messages": [
"Rear-end collision with moderate damage",
"...model output...",
"...assessment output..."
],
"claim_type": "auto",
"risk_score": 0.72,
"decision": "escalate"
}
If you want to validate Kubernetes connectivity from Python directly:
from kubernetes import client, config
config.load_kube_config()
v1 = client.CoreV1Api()
namespaces = v1.list_namespace()
print([ns.metadata.name for ns in namespaces.items][:5])
Real-World Use Cases
- •Claims triage pipelines where one agent extracts details from documents, another scores risk, and a third decides whether to escalate to a human adjuster.
- •Underwriting workflows that fan out across policy rules agents, external data lookup agents, and compliance agents deployed as separate Kubernetes services.
- •Fraud detection systems that autoscale investigation agents during spikes in FNOL submissions or suspicious claim bursts.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit