How to Integrate LangGraph for healthcare with Kubernetes for multi-agent systems
Healthcare agent systems fail in boring ways: one agent can draft a care summary, another can validate policy rules, but without orchestration you get race conditions, duplicated work, and no clean audit trail. LangGraph gives you the control flow for multi-agent healthcare workflows, and Kubernetes gives you the runtime to scale, isolate, and recover those workflows under real load.
Prerequisites
- •Python 3.10+
- •A running Kubernetes cluster
- •
kubectlconfigured to point at that cluster - •Access to a healthcare data source or a mock FHIR endpoint
- •LangGraph installed:
- •
pip install langgraph langchain langchain-openai
- •
- •Kubernetes Python client installed:
- •
pip install kubernetes
- •
- •Environment variables set for your model provider and cluster access:
- •
OPENAI_API_KEY - •
KUBECONFIGor in-cluster service account access
- •
- •Basic familiarity with:
- •Python async code
- •Kubernetes Deployments and Services
- •Agent routing and state passing
Integration Steps
1) Define the healthcare workflow in LangGraph
Start by modeling the work as a state machine. For healthcare, a common pattern is intake → triage → compliance check → summary generation.
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
class PatientState(TypedDict):
patient_id: str
symptoms: str
risk_level: str
compliance_ok: bool
summary: str
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
def triage_node(state: PatientState):
prompt = f"Assess risk from symptoms: {state['symptoms']}. Return only low, medium, or high."
result = llm.invoke(prompt).content.strip()
return {"risk_level": result}
def compliance_node(state: PatientState):
# Replace this with your PHI policy checks / rules engine
ok = "chest pain" not in state["symptoms"].lower()
return {"compliance_ok": ok}
def summary_node(state: PatientState):
prompt = (
f"Create a short clinical summary for patient {state['patient_id']} "
f"with risk={state['risk_level']} and compliance_ok={state['compliance_ok']}."
)
result = llm.invoke(prompt).content.strip()
return {"summary": result}
graph = StateGraph(PatientState)
graph.add_node("triage", triage_node)
graph.add_node("compliance", compliance_node)
graph.add_node("summary", summary_node)
graph.set_entry_point("triage")
graph.add_edge("triage", "compliance")
graph.add_edge("compliance", "summary")
graph.add_edge("summary", END)
app = graph.compile()
This gives you deterministic control flow. In healthcare systems, that matters more than fancy prompting because you need predictable execution paths and auditability.
2) Add branching for multi-agent routing
A multi-agent system should route based on state, not hardcoded if/else scattered across services. LangGraph supports conditional edges so you can send high-risk cases to escalation while keeping routine cases cheap.
def route_by_risk(state: PatientState):
if state["risk_level"] == "high":
return "escalate"
return "summary"
def escalate_node(state: PatientState):
return {
"summary": f"Escalated patient {state['patient_id']} for urgent clinical review."
}
graph = StateGraph(PatientState)
graph.add_node("triage", triage_node)
graph.add_node("compliance", compliance_node)
graph.add_node("summary", summary_node)
graph.add_node("escalate", escalate_node)
graph.set_entry_point("triage")
graph.add_edge("triage", "compliance")
graph.add_conditional_edges(
"compliance",
route_by_risk,
{
"escalate": "escalate",
"summary": "summary",
},
)
graph.add_edge("escalate", END)
graph.add_edge("summary", END)
app = graph.compile()
This is where LangGraph fits healthcare well. You can keep one workflow definition while still handling different operational paths for nurse review, physician escalation, or automated discharge summaries.
3) Package the graph as a service for Kubernetes
Expose the graph behind an API so Kubernetes can run it as a stateless workload. The app below uses FastAPI and calls app.invoke() from LangGraph.
from fastapi import FastAPI
from pydantic import BaseModel
api = FastAPI()
class IntakeRequest(BaseModel):
patient_id: str
symptoms: str
@api.post("/run")
def run_workflow(req: IntakeRequest):
result = app.invoke({
"patient_id": req.patient_id,
"symptoms": req.symptoms,
"risk_level": "",
"compliance_ok": False,
"summary": ""
})
return result
Run this container in Kubernetes with a Deployment and Service. Keep the graph stateless; persist long-running state in Redis, Postgres, or your EHR integration layer.
4) Deploy to Kubernetes with the Python client
Use the Kubernetes SDK when you want your orchestration layer to create jobs dynamically for batch reviews or isolated agent runs.
from kubernetes import client, config
config.load_kube_config()
batch_api = client.BatchV1Api()
job_manifest = client.V1Job(
metadata=client.V1ObjectMeta(name="healthcare-agent-job"),
spec=client.V1JobSpec(
template=client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(labels={"app": "healthcare-agent"}),
spec=client.V1PodSpec(
restart_policy="Never",
containers=[
client.V1Container(
name="agent",
image="your-registry/healthcare-agent:latest",
env=[
client.V1EnvVar(name="OPENAI_API_KEY", value_from=None),
],
)
],
),
),
backoff_limit=2,
),
)
batch_api.create_namespaced_job(namespace="default", body=job_manifest)
For real systems, use this pattern when you need per-patient isolation or batch processing of queued claims, referrals, or chart review tasks.
5) Use Kubernetes primitives for scaling and resilience
Once the API is deployed, scale it like any other service. HPA handles traffic spikes; readiness probes keep bad pods out of rotation.
apiVersion: apps/v1
kind: Deployment
metadata:
name: healthcare-agent-api
spec:
replicas: 3
selector:
matchLabels:
app: healthcare-agent-api
template:
metadata:
labels:
app: healthcare-agent-api
spec:
containers:
- name: api
image: your-registry/healthcare-agent-api:latest
ports:
- containerPort: 8000
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: llm-secrets
key: openai_api_key
---
apiVersion: v1
kind: Service
metadata:
name: healthcare-agent-api
spec:
selector:
app: healthcare-agent-api
ports:
- port: 80
targetPort: 8000
Kubernetes handles runtime concerns; LangGraph handles workflow logic. That separation keeps your agent code clean and your platform team happy.
Testing the Integration
Hit the API after deploying it to Kubernetes and confirm the workflow returns structured output.
import requests
resp = requests.post(
"http://healthcare-agent-api.default.svc.cluster.local/run",
json={
"patient_id": "P-1029",
"symptoms": "headache and fatigue",
},
)
print(resp.status_code)
print(resp.json())
Expected output:
200
{
'patient_id': 'P-1029',
'symptoms': 'headache and fatigue',
'risk_level': 'low',
'compliance_ok': True,
'summary': '...'
}
If you test locally first with kubectl port-forward, make sure the same payload works before moving to cluster DNS.
Real-World Use Cases
- •
Clinical intake routing
- •One agent extracts symptoms.
- •Another checks policy constraints.
- •A third escalates high-risk cases to human review.
- •
Claims and prior authorization
- •LangGraph coordinates document extraction, medical necessity checks, and exception handling.
- •Kubernetes runs each claim workflow in isolated pods for traceability.
- •
Population health batch jobs
- •Run nightly agent workflows over thousands of records.
- •Use Kubernetes Jobs for parallelism and retries without changing graph logic.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit