How to Integrate LangGraph for healthcare with Kubernetes for production AI
Healthcare agents are only useful when they can hold state, route work safely, and survive restarts. LangGraph gives you the orchestration layer for clinical workflows, while Kubernetes gives you the runtime to run those workflows reliably across replicas, failures, and deployments.
This combo is what you want for triage assistants, prior-auth agents, discharge-summary pipelines, and care-coordination systems that need auditability plus production-grade scaling.
Prerequisites
- •Python 3.10+
- •A Kubernetes cluster:
- •local:
kind,minikube, ork3d - •production: EKS, GKE, AKS, or on-prem
- •local:
- •
kubectlconfigured and pointing at your cluster - •Docker installed for building images
- •LangGraph installed:
- •
langgraph - •your healthcare-specific graph nodes/tools
- •
- •Kubernetes Python client:
- •
kubernetes
- •
- •Access to a model provider or internal LLM endpoint
- •A persistent store for graph state:
- •Postgres, Redis, or a managed vector/state backend
- •Basic familiarity with:
- •
StateGraph - •Kubernetes Deployments and Services
- •
Integration Steps
- •
Define the healthcare workflow as a LangGraph state machine
Start by modeling the clinical process as explicit nodes. In healthcare, you want deterministic routing for intake, redaction, escalation, and human review.
from typing import TypedDict, Annotated from langgraph.graph import StateGraph, START, END from langgraph.graph.message import add_messages class PatientState(TypedDict): messages: Annotated[list, add_messages] risk_level: str needs_human_review: bool def intake_node(state: PatientState): text = state["messages"][-1].content.lower() if "chest pain" in text or "shortness of breath" in text: return {"risk_level": "high", "needs_human_review": True} return {"risk_level": "low", "needs_human_review": False} def redact_node(state: PatientState): return {"messages": state["messages"]} def review_router(state: PatientState): return "human_review" if state["needs_human_review"] else END graph = StateGraph(PatientState) graph.add_node("intake", intake_node) graph.add_node("redact", redact_node) graph.add_edge(START, "intake") graph.add_edge("intake", "redact") graph.add_conditional_edges("redact", review_router) app = graph.compile()The key here is that the workflow is explicit. That makes it easier to audit clinical decisions and easier to scale each step independently in Kubernetes.
- •
Package the graph as a service with a health endpoint
Run the LangGraph app behind an HTTP API so Kubernetes can manage it like any other service. Keep the API thin and let the graph do the work.
from fastapi import FastAPI from pydantic import BaseModel from langchain_core.messages import HumanMessage api = FastAPI() class ChatRequest(BaseModel): text: str @api.post("/triage") async def triage(req: ChatRequest): result = app.invoke({ "messages": [HumanMessage(content=req.text)], "risk_level": "", "needs_human_review": False, }) return { "risk_level": result["risk_level"], "needs_human_review": result["needs_human_review"], } @api.get("/healthz") async def healthz(): return {"status": "ok"}This pattern keeps your deployment simple:
- •one container per agent service
- •one readiness probe
- •one liveness probe
- •one scaling target
- •
Use the Kubernetes Python client to deploy the agent
If you want your application to create or update its own workloads, use the official client. The methods below are real API calls from
kubernetes.client.from kubernetes import client, config config.load_kube_config() apps_v1 = client.AppsV1Api() core_v1 = client.CoreV1Api() deployment = client.V1Deployment( metadata=client.V1ObjectMeta(name="langgraph-healthcare-agent"), spec=client.V1DeploymentSpec( replicas=2, selector=client.V1LabelSelector( match_labels={"app": "langgraph-healthcare-agent"} ), template=client.V1PodTemplateSpec( metadata=client.V1ObjectMeta(labels={"app": "langgraph-healthcare-agent"}), spec=client.V1PodSpec( containers=[ client.V1Container( name="agent", image="ghcr.io/your-org/langgraph-healthcare-agent:latest", ports=[client.V1ContainerPort(container_port=8000)], readiness_probe=client.V1Probe( http_get=client.V1HTTPGetAction(path="/healthz", port=8000) ), ) ] ), ), ), ) apps_v1.create_namespaced_deployment(namespace="default", body=deployment) - •
Add a Service and scale based on workload
Expose the agent with a Service so internal consumers can call it. Then scale replicas when traffic increases or when you need more throughput for batch triage jobs.
service = client.V1Service( metadata=client.V1ObjectMeta(name="langgraph-healthcare-agent"), spec=client.V1ServiceSpec( selector={"app": "langgraph-healthcare-agent"}, ports=[client.V1ServicePort(port=80, target_port=8000)], type="ClusterIP", ), ) core_v1.create_namespaced_service(namespace="default", body=service) scale = client.V1Scale( spec=client.V1ScaleSpec(replicas=4) ) apps_v1.patch_namespaced_deployment_scale( name="langgraph-healthcare-agent", namespace="default", body=scale, ) - •
Wire persistence for safe retries and recovery
Healthcare workflows should not lose state when pods restart. Use a durable checkpoint store so LangGraph can resume conversations and partial clinical flows.
from langgraph.checkpoint.memory import MemorySaver # Replace this with a durable backend in production. checkpointer = MemorySaver() app_with_state = graph.compile(checkpointer=checkpointer) result = app_with_state.invoke({ "messages": [HumanMessage(content="I have severe chest pain")], "risk_level": "", "needs_human_review": False, }) print(result)
Testing the Integration
Run a simple request against the API after deploying it to Kubernetes.
from fastapi.testclient import TestClient
client = TestClient(api)
response = client.post("/triage", json={"text": "I have chest pain and shortness of breath"})
print(response.status_code)
print(response.json())
Expected output:
200
{'risk_level': 'high', 'needs_human_review': True}
If you want to verify Kubernetes wiring too:
kubectl get deploy langgraph-healthcare-agent
kubectl get svc langgraph-healthcare-agent
kubectl logs deploy/langgraph-healthcare-agent
Real-World Use Cases
- •
Clinical triage assistant
- •Routes patient messages into low-risk self-service flows or high-risk human escalation paths.
- •Keeps every decision in a graph trace for audit review.
- •
Prior authorization agent
- •Pulls policy rules, checks documentation completeness, and escalates missing evidence.
- •Scales horizontally during claims spikes.
- •
Discharge coordination workflow
- •Generates discharge instructions, verifies medication reconciliation steps, and triggers follow-up tasks.
- •Runs reliably even if individual pods restart mid-workflow.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit