How to Integrate LangGraph for healthcare with Kubernetes for production AI

By Cyprian AaronsUpdated 2026-04-21

langgraph-for-healthcarekubernetesproduction-ai

Healthcare agents are only useful when they can hold state, route work safely, and survive restarts. LangGraph gives you the orchestration layer for clinical workflows, while Kubernetes gives you the runtime to run those workflows reliably across replicas, failures, and deployments.

This combo is what you want for triage assistants, prior-auth agents, discharge-summary pipelines, and care-coordination systems that need auditability plus production-grade scaling.

Prerequisites

•Python 3.10+
•
A Kubernetes cluster:
- •local: kind, minikube, or k3d
- •production: EKS, GKE, AKS, or on-prem
•kubectl configured and pointing at your cluster
•Docker installed for building images
•
LangGraph installed:
- •langgraph
- •your healthcare-specific graph nodes/tools
•
Kubernetes Python client:
- •kubernetes
•Access to a model provider or internal LLM endpoint
•
A persistent store for graph state:
- •Postgres, Redis, or a managed vector/state backend
•
Basic familiarity with:
- •StateGraph
- •Kubernetes Deployments and Services

Integration Steps

•

Define the healthcare workflow as a LangGraph state machine

Start by modeling the clinical process as explicit nodes. In healthcare, you want deterministic routing for intake, redaction, escalation, and human review.

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages

class PatientState(TypedDict):
    messages: Annotated[list, add_messages]
    risk_level: str
    needs_human_review: bool

def intake_node(state: PatientState):
    text = state["messages"][-1].content.lower()
    if "chest pain" in text or "shortness of breath" in text:
        return {"risk_level": "high", "needs_human_review": True}
    return {"risk_level": "low", "needs_human_review": False}

def redact_node(state: PatientState):
    return {"messages": state["messages"]}

def review_router(state: PatientState):
    return "human_review" if state["needs_human_review"] else END

graph = StateGraph(PatientState)
graph.add_node("intake", intake_node)
graph.add_node("redact", redact_node)
graph.add_edge(START, "intake")
graph.add_edge("intake", "redact")
graph.add_conditional_edges("redact", review_router)

app = graph.compile()

The key here is that the workflow is explicit. That makes it easier to audit clinical decisions and easier to scale each step independently in Kubernetes.

•

Package the graph as a service with a health endpoint

Run the LangGraph app behind an HTTP API so Kubernetes can manage it like any other service. Keep the API thin and let the graph do the work.

from fastapi import FastAPI
from pydantic import BaseModel
from langchain_core.messages import HumanMessage

api = FastAPI()

class ChatRequest(BaseModel):
    text: str

@api.post("/triage")
async def triage(req: ChatRequest):
    result = app.invoke({
        "messages": [HumanMessage(content=req.text)],
        "risk_level": "",
        "needs_human_review": False,
    })
    return {
        "risk_level": result["risk_level"],
        "needs_human_review": result["needs_human_review"],
    }

@api.get("/healthz")
async def healthz():
    return {"status": "ok"}

This pattern keeps your deployment simple:

•one container per agent service
•one readiness probe
•one liveness probe
•one scaling target

•

Use the Kubernetes Python client to deploy the agent

If you want your application to create or update its own workloads, use the official client. The methods below are real API calls from kubernetes.client.

from kubernetes import client, config

config.load_kube_config()

apps_v1 = client.AppsV1Api()
core_v1 = client.CoreV1Api()

deployment = client.V1Deployment(
    metadata=client.V1ObjectMeta(name="langgraph-healthcare-agent"),
    spec=client.V1DeploymentSpec(
        replicas=2,
        selector=client.V1LabelSelector(
            match_labels={"app": "langgraph-healthcare-agent"}
        ),
        template=client.V1PodTemplateSpec(
            metadata=client.V1ObjectMeta(labels={"app": "langgraph-healthcare-agent"}),
            spec=client.V1PodSpec(
                containers=[
                    client.V1Container(
                        name="agent",
                        image="ghcr.io/your-org/langgraph-healthcare-agent:latest",
                        ports=[client.V1ContainerPort(container_port=8000)],
                        readiness_probe=client.V1Probe(
                            http_get=client.V1HTTPGetAction(path="/healthz", port=8000)
                        ),
                    )
                ]
            ),
        ),
    ),
)

apps_v1.create_namespaced_deployment(namespace="default", body=deployment)

•

Add a Service and scale based on workload

Expose the agent with a Service so internal consumers can call it. Then scale replicas when traffic increases or when you need more throughput for batch triage jobs.

 service = client.V1Service(
     metadata=client.V1ObjectMeta(name="langgraph-healthcare-agent"),
     spec=client.V1ServiceSpec(
         selector={"app": "langgraph-healthcare-agent"},
         ports=[client.V1ServicePort(port=80, target_port=8000)],
         type="ClusterIP",
     ),
 )

 core_v1.create_namespaced_service(namespace="default", body=service)

 scale = client.V1Scale(
     spec=client.V1ScaleSpec(replicas=4)
 )

 apps_v1.patch_namespaced_deployment_scale(
     name="langgraph-healthcare-agent",
     namespace="default",
     body=scale,
 )

•

Wire persistence for safe retries and recovery

Healthcare workflows should not lose state when pods restart. Use a durable checkpoint store so LangGraph can resume conversations and partial clinical flows.

from langgraph.checkpoint.memory import MemorySaver

 # Replace this with a durable backend in production.
 checkpointer = MemorySaver()

 app_with_state = graph.compile(checkpointer=checkpointer)

 result = app_with_state.invoke({
     "messages": [HumanMessage(content="I have severe chest pain")],
     "risk_level": "",
     "needs_human_review": False,
 })
 print(result)

Testing the Integration

Run a simple request against the API after deploying it to Kubernetes.

from fastapi.testclient import TestClient

client = TestClient(api)

response = client.post("/triage", json={"text": "I have chest pain and shortness of breath"})
print(response.status_code)
print(response.json())

Expected output:

200
{'risk_level': 'high', 'needs_human_review': True}

If you want to verify Kubernetes wiring too:

kubectl get deploy langgraph-healthcare-agent
kubectl get svc langgraph-healthcare-agent
kubectl logs deploy/langgraph-healthcare-agent

Real-World Use Cases

•
Clinical triage assistant
- •Routes patient messages into low-risk self-service flows or high-risk human escalation paths.
- •Keeps every decision in a graph trace for audit review.
•
Prior authorization agent
- •Pulls policy rules, checks documentation completeness, and escalates missing evidence.
- •Scales horizontally during claims spikes.
•
Discharge coordination workflow
- •Generates discharge instructions, verifies medication reconciliation steps, and triggers follow-up tasks.
- •Runs reliably even if individual pods restart mid-workflow.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit