How to Integrate LangGraph for healthcare with Kubernetes for multi-agent systems

By Cyprian AaronsUpdated 2026-04-21
langgraph-for-healthcarekubernetesmulti-agent-systems

Healthcare agent systems fail in boring ways: one agent can draft a care summary, another can validate policy rules, but without orchestration you get race conditions, duplicated work, and no clean audit trail. LangGraph gives you the control flow for multi-agent healthcare workflows, and Kubernetes gives you the runtime to scale, isolate, and recover those workflows under real load.

Prerequisites

  • Python 3.10+
  • A running Kubernetes cluster
  • kubectl configured to point at that cluster
  • Access to a healthcare data source or a mock FHIR endpoint
  • LangGraph installed:
    • pip install langgraph langchain langchain-openai
  • Kubernetes Python client installed:
    • pip install kubernetes
  • Environment variables set for your model provider and cluster access:
    • OPENAI_API_KEY
    • KUBECONFIG or in-cluster service account access
  • Basic familiarity with:
    • Python async code
    • Kubernetes Deployments and Services
    • Agent routing and state passing

Integration Steps

1) Define the healthcare workflow in LangGraph

Start by modeling the work as a state machine. For healthcare, a common pattern is intake → triage → compliance check → summary generation.

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI

class PatientState(TypedDict):
    patient_id: str
    symptoms: str
    risk_level: str
    compliance_ok: bool
    summary: str

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def triage_node(state: PatientState):
    prompt = f"Assess risk from symptoms: {state['symptoms']}. Return only low, medium, or high."
    result = llm.invoke(prompt).content.strip()
    return {"risk_level": result}

def compliance_node(state: PatientState):
    # Replace this with your PHI policy checks / rules engine
    ok = "chest pain" not in state["symptoms"].lower()
    return {"compliance_ok": ok}

def summary_node(state: PatientState):
    prompt = (
        f"Create a short clinical summary for patient {state['patient_id']} "
        f"with risk={state['risk_level']} and compliance_ok={state['compliance_ok']}."
    )
    result = llm.invoke(prompt).content.strip()
    return {"summary": result}

graph = StateGraph(PatientState)
graph.add_node("triage", triage_node)
graph.add_node("compliance", compliance_node)
graph.add_node("summary", summary_node)

graph.set_entry_point("triage")
graph.add_edge("triage", "compliance")
graph.add_edge("compliance", "summary")
graph.add_edge("summary", END)

app = graph.compile()

This gives you deterministic control flow. In healthcare systems, that matters more than fancy prompting because you need predictable execution paths and auditability.

2) Add branching for multi-agent routing

A multi-agent system should route based on state, not hardcoded if/else scattered across services. LangGraph supports conditional edges so you can send high-risk cases to escalation while keeping routine cases cheap.

def route_by_risk(state: PatientState):
    if state["risk_level"] == "high":
        return "escalate"
    return "summary"

def escalate_node(state: PatientState):
    return {
        "summary": f"Escalated patient {state['patient_id']} for urgent clinical review."
    }

graph = StateGraph(PatientState)
graph.add_node("triage", triage_node)
graph.add_node("compliance", compliance_node)
graph.add_node("summary", summary_node)
graph.add_node("escalate", escalate_node)

graph.set_entry_point("triage")
graph.add_edge("triage", "compliance")
graph.add_conditional_edges(
    "compliance",
    route_by_risk,
    {
        "escalate": "escalate",
        "summary": "summary",
    },
)
graph.add_edge("escalate", END)
graph.add_edge("summary", END)

app = graph.compile()

This is where LangGraph fits healthcare well. You can keep one workflow definition while still handling different operational paths for nurse review, physician escalation, or automated discharge summaries.

3) Package the graph as a service for Kubernetes

Expose the graph behind an API so Kubernetes can run it as a stateless workload. The app below uses FastAPI and calls app.invoke() from LangGraph.

from fastapi import FastAPI
from pydantic import BaseModel

api = FastAPI()

class IntakeRequest(BaseModel):
    patient_id: str
    symptoms: str

@api.post("/run")
def run_workflow(req: IntakeRequest):
    result = app.invoke({
        "patient_id": req.patient_id,
        "symptoms": req.symptoms,
        "risk_level": "",
        "compliance_ok": False,
        "summary": ""
    })
    return result

Run this container in Kubernetes with a Deployment and Service. Keep the graph stateless; persist long-running state in Redis, Postgres, or your EHR integration layer.

4) Deploy to Kubernetes with the Python client

Use the Kubernetes SDK when you want your orchestration layer to create jobs dynamically for batch reviews or isolated agent runs.

from kubernetes import client, config

config.load_kube_config()

batch_api = client.BatchV1Api()

job_manifest = client.V1Job(
    metadata=client.V1ObjectMeta(name="healthcare-agent-job"),
    spec=client.V1JobSpec(
        template=client.V1PodTemplateSpec(
            metadata=client.V1ObjectMeta(labels={"app": "healthcare-agent"}),
            spec=client.V1PodSpec(
                restart_policy="Never",
                containers=[
                    client.V1Container(
                        name="agent",
                        image="your-registry/healthcare-agent:latest",
                        env=[
                            client.V1EnvVar(name="OPENAI_API_KEY", value_from=None),
                        ],
                    )
                ],
            ),
        ),
        backoff_limit=2,
    ),
)

batch_api.create_namespaced_job(namespace="default", body=job_manifest)

For real systems, use this pattern when you need per-patient isolation or batch processing of queued claims, referrals, or chart review tasks.

5) Use Kubernetes primitives for scaling and resilience

Once the API is deployed, scale it like any other service. HPA handles traffic spikes; readiness probes keep bad pods out of rotation.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: healthcare-agent-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: healthcare-agent-api
  template:
    metadata:
      labels:
        app: healthcare-agent-api
    spec:
      containers:
        - name: api
          image: your-registry/healthcare-agent-api:latest
          ports:
            - containerPort: 8000
          env:
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: llm-secrets
                  key: openai_api_key
---
apiVersion: v1
kind: Service
metadata:
  name: healthcare-agent-api
spec:
  selector:
    app: healthcare-agent-api
  ports:
    - port: 80
      targetPort: 8000

Kubernetes handles runtime concerns; LangGraph handles workflow logic. That separation keeps your agent code clean and your platform team happy.

Testing the Integration

Hit the API after deploying it to Kubernetes and confirm the workflow returns structured output.

import requests

resp = requests.post(
    "http://healthcare-agent-api.default.svc.cluster.local/run",
    json={
        "patient_id": "P-1029",
        "symptoms": "headache and fatigue",
    },
)

print(resp.status_code)
print(resp.json())

Expected output:

200
{
  'patient_id': 'P-1029',
  'symptoms': 'headache and fatigue',
  'risk_level': 'low',
  'compliance_ok': True,
  'summary': '...'
}

If you test locally first with kubectl port-forward, make sure the same payload works before moving to cluster DNS.

Real-World Use Cases

  • Clinical intake routing

    • One agent extracts symptoms.
    • Another checks policy constraints.
    • A third escalates high-risk cases to human review.
  • Claims and prior authorization

    • LangGraph coordinates document extraction, medical necessity checks, and exception handling.
    • Kubernetes runs each claim workflow in isolated pods for traceability.
  • Population health batch jobs

    • Run nightly agent workflows over thousands of records.
    • Use Kubernetes Jobs for parallelism and retries without changing graph logic.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides