How to Integrate LangGraph for insurance with Kubernetes for RAG

By Cyprian AaronsUpdated 2026-04-21

langgraph-for-insurancekubernetesrag

Combining LangGraph for insurance with Kubernetes gives you a clean way to run regulated RAG workflows as durable, observable services. In practice, that means claims triage, policy Q&A, and underwriting assistants can retrieve from internal documents while staying deployable, scalable, and isolated inside your cluster.

Prerequisites

•Python 3.10+
•Access to a Kubernetes cluster
•kubectl configured for the target cluster
•A container registry for pushing images
•
LangGraph installed:
- •pip install langgraph langchain-openai
•
Kubernetes Python client installed:
- •pip install kubernetes
•An LLM API key set in your environment
•A document store or vector index for RAG data
•A Kubernetes namespace for the agent workload
•
RBAC permissions to create:
- •Deployment
- •Service
- •ConfigMap
- •Secret

Integration Steps

•

Build the LangGraph workflow for insurance RAG

Start with a graph that routes insurance questions through retrieval before generating an answer. For production insurance use cases, keep the retrieval step explicit so you can inspect sources and enforce policy controls.

from typing import TypedDict, List
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI

class GraphState(TypedDict):
    question: str
    context: List[str]
    answer: str

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def retrieve_docs(state: GraphState) -> GraphState:
    # Replace with your vector DB lookup
    docs = [
        "Policy A covers fire damage with a 30-day reporting window.",
        "Policy B excludes flood damage unless rider X is active."
    ]
    return {**state, "context": docs}

def generate_answer(state: GraphState) -> GraphState:
    prompt = f"""
    Answer the insurance question using only the context.

    Question: {state["question"]}
    Context:
    {chr(10).join(state["context"])}
    """
    response = llm.invoke(prompt)
    return {**state, "answer": response.content}

graph = StateGraph(GraphState)
graph.add_node("retrieve", retrieve_docs)
graph.add_node("generate", generate_answer)
graph.set_entry_point("retrieve")
graph.add_edge("retrieve", "generate")
graph.add_edge("generate", END)

app = graph.compile()

•

Package the graph as a service-friendly Python app

Your Kubernetes pod should expose a simple HTTP interface. Keep the graph execution inside the container so each replica can process requests independently.

from fastapi import FastAPI
from pydantic import BaseModel

class QueryRequest(BaseModel):
    question: str

api = FastAPI()

@api.post("/ask")
def ask(req: QueryRequest):
    result = app.invoke({"question": req.question, "context": [], "answer": ""})
    return {"answer": result["answer"], "sources": result["context"]}

•

Create Kubernetes resources from Python

Use the Kubernetes client to create a deployment and service for the agent. This is where LangGraph becomes a real workload on the cluster instead of a local script.

from kubernetes import client, config

config.load_kube_config()

apps_v1 = client.AppsV1Api()
core_v1 = client.CoreV1Api()

namespace = "insurance-rag"

deployment = client.V1Deployment(
    metadata=client.V1ObjectMeta(name="langgraph-insurance-agent"),
    spec=client.V1DeploymentSpec(
        replicas=2,
        selector=client.V1LabelSelector(
            match_labels={"app": "langgraph-insurance-agent"}
        ),
        template=client.V1PodTemplateSpec(
            metadata=client.V1ObjectMeta(labels={"app": "langgraph-insurance-agent"}),
            spec=client.V1PodSpec(
                containers=[
                    client.V1Container(
                        name="agent",
                        image="your-registry/langgraph-insurance-agent:latest",
                        ports=[client.V1ContainerPort(container_port=8000)],
                    )
                ]
            ),
        ),
    ),
)

service = client.V1Service(
    metadata=client.V1ObjectMeta(name="langgraph-insurance-agent"),
    spec=client.V1ServiceSpec(
        selector={"app": "langgraph-insurance-agent"},
        ports=[client.V1ServicePort(port=80, target_port=8000)],
        type="ClusterIP",
    ),
)

apps_v1.create_namespaced_deployment(namespace=namespace, body=deployment)
core_v1.create_namespaced_service(namespace=namespace, body=service)

•

Wire secrets and config into the pod

Keep model keys and retrieval settings out of code. Store them in Kubernetes Secrets and ConfigMaps, then mount them as environment variables.

 secret = client.V1Secret(
     metadata=client.V1ObjectMeta(name="llm-secrets"),
     string_data={
         "OPENAI_API_KEY": "replace-me"
     },
     type="Opaque",
 )

 config_map = client.V1ConfigMap(
     metadata=client.V1ObjectMeta(name="rag-config"),
     data={
         "VECTOR_INDEX_NAME": "insurance-policies",
         "TOP_K": "4"
     }
 )

 core_v1.create_namespaced_secret(namespace=namespace, body=secret)
 core_v1.create_namespaced_config_map(namespace=namespace, body=config_map)

•

Update the deployment to consume those values

Add environment references so every replica gets the same runtime configuration. This keeps your LangGraph behavior stable across pods.

 container_env = [
     client.V1EnvVar(
         name="OPENAI_API_KEY",
         value_from=client.V1EnvVarSource(
             secret_key_ref=client.V1SecretKeySelector(
                 name="llm-secrets",
                 key="OPENAI_API_KEY"
             )
         )
     ),
     client.V1EnvVar(
         name="VECTOR_INDEX_NAME",
         value_from=client.V1EnvVarSource(
             config_map_key_ref=client.V1ConfigMapKeySelector(
                 name="rag-config",
                 key="VECTOR_INDEX_NAME"
             )
         )
     )
 ]

Testing the Integration

Run a request against the service after it is deployed. If you are testing locally through port-forwarding or an ingress route, hit /ask with a real insurance query.

import requests

response = requests.post(
    "http://localhost:8000/ask",
    json={"question": "Does Policy B cover flood damage?"}
)

print(response.status_code)
print(response.json())

Expected output:

200
{
  "answer": "...Policy B excludes flood damage unless rider X is active...",
  "sources": [
    "Policy A covers fire damage with a 30-day reporting window.",
    "Policy B excludes flood damage unless rider X is active."
  ]
}

Real-World Use Cases

•
Claims intake assistant
- •Route claimant questions through retrieval over policy documents, then run on Kubernetes so multiple adjusters can query it concurrently.
•
Underwriting policy checker
- •Use LangGraph to inspect applicant answers against underwriting rules and deploy it as a namespaced service with strict resource limits.
•
Broker support bot
- •Serve broker-facing RAG over coverage guides and endorsements inside your cluster so data access stays inside your network boundary.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit