How to Integrate LangGraph for insurance with Kubernetes for RAG

By Cyprian AaronsUpdated 2026-04-21
langgraph-for-insurancekubernetesrag

Combining LangGraph for insurance with Kubernetes gives you a clean way to run regulated RAG workflows as durable, observable services. In practice, that means claims triage, policy Q&A, and underwriting assistants can retrieve from internal documents while staying deployable, scalable, and isolated inside your cluster.

Prerequisites

  • Python 3.10+
  • Access to a Kubernetes cluster
  • kubectl configured for the target cluster
  • A container registry for pushing images
  • LangGraph installed:
    • pip install langgraph langchain-openai
  • Kubernetes Python client installed:
    • pip install kubernetes
  • An LLM API key set in your environment
  • A document store or vector index for RAG data
  • A Kubernetes namespace for the agent workload
  • RBAC permissions to create:
    • Deployment
    • Service
    • ConfigMap
    • Secret

Integration Steps

  1. Build the LangGraph workflow for insurance RAG

    Start with a graph that routes insurance questions through retrieval before generating an answer. For production insurance use cases, keep the retrieval step explicit so you can inspect sources and enforce policy controls.

    from typing import TypedDict, List
    from langgraph.graph import StateGraph, END
    from langchain_openai import ChatOpenAI
    
    class GraphState(TypedDict):
        question: str
        context: List[str]
        answer: str
    
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    
    def retrieve_docs(state: GraphState) -> GraphState:
        # Replace with your vector DB lookup
        docs = [
            "Policy A covers fire damage with a 30-day reporting window.",
            "Policy B excludes flood damage unless rider X is active."
        ]
        return {**state, "context": docs}
    
    def generate_answer(state: GraphState) -> GraphState:
        prompt = f"""
        Answer the insurance question using only the context.
    
        Question: {state["question"]}
        Context:
        {chr(10).join(state["context"])}
        """
        response = llm.invoke(prompt)
        return {**state, "answer": response.content}
    
    graph = StateGraph(GraphState)
    graph.add_node("retrieve", retrieve_docs)
    graph.add_node("generate", generate_answer)
    graph.set_entry_point("retrieve")
    graph.add_edge("retrieve", "generate")
    graph.add_edge("generate", END)
    
    app = graph.compile()
    
  2. Package the graph as a service-friendly Python app

    Your Kubernetes pod should expose a simple HTTP interface. Keep the graph execution inside the container so each replica can process requests independently.

    from fastapi import FastAPI
    from pydantic import BaseModel
    
    class QueryRequest(BaseModel):
        question: str
    
    api = FastAPI()
    
    @api.post("/ask")
    def ask(req: QueryRequest):
        result = app.invoke({"question": req.question, "context": [], "answer": ""})
        return {"answer": result["answer"], "sources": result["context"]}
    
  3. Create Kubernetes resources from Python

    Use the Kubernetes client to create a deployment and service for the agent. This is where LangGraph becomes a real workload on the cluster instead of a local script.

    from kubernetes import client, config
    
    config.load_kube_config()
    
    apps_v1 = client.AppsV1Api()
    core_v1 = client.CoreV1Api()
    
    namespace = "insurance-rag"
    
    deployment = client.V1Deployment(
        metadata=client.V1ObjectMeta(name="langgraph-insurance-agent"),
        spec=client.V1DeploymentSpec(
            replicas=2,
            selector=client.V1LabelSelector(
                match_labels={"app": "langgraph-insurance-agent"}
            ),
            template=client.V1PodTemplateSpec(
                metadata=client.V1ObjectMeta(labels={"app": "langgraph-insurance-agent"}),
                spec=client.V1PodSpec(
                    containers=[
                        client.V1Container(
                            name="agent",
                            image="your-registry/langgraph-insurance-agent:latest",
                            ports=[client.V1ContainerPort(container_port=8000)],
                        )
                    ]
                ),
            ),
        ),
    )
    
    service = client.V1Service(
        metadata=client.V1ObjectMeta(name="langgraph-insurance-agent"),
        spec=client.V1ServiceSpec(
            selector={"app": "langgraph-insurance-agent"},
            ports=[client.V1ServicePort(port=80, target_port=8000)],
            type="ClusterIP",
        ),
    )
    
    apps_v1.create_namespaced_deployment(namespace=namespace, body=deployment)
    core_v1.create_namespaced_service(namespace=namespace, body=service)
    
  4. Wire secrets and config into the pod

    Keep model keys and retrieval settings out of code. Store them in Kubernetes Secrets and ConfigMaps, then mount them as environment variables.

     secret = client.V1Secret(
         metadata=client.V1ObjectMeta(name="llm-secrets"),
         string_data={
             "OPENAI_API_KEY": "replace-me"
         },
         type="Opaque",
     )
    
     config_map = client.V1ConfigMap(
         metadata=client.V1ObjectMeta(name="rag-config"),
         data={
             "VECTOR_INDEX_NAME": "insurance-policies",
             "TOP_K": "4"
         }
     )
    
     core_v1.create_namespaced_secret(namespace=namespace, body=secret)
     core_v1.create_namespaced_config_map(namespace=namespace, body=config_map)
    
  5. Update the deployment to consume those values

    Add environment references so every replica gets the same runtime configuration. This keeps your LangGraph behavior stable across pods.

     container_env = [
         client.V1EnvVar(
             name="OPENAI_API_KEY",
             value_from=client.V1EnvVarSource(
                 secret_key_ref=client.V1SecretKeySelector(
                     name="llm-secrets",
                     key="OPENAI_API_KEY"
                 )
             )
         ),
         client.V1EnvVar(
             name="VECTOR_INDEX_NAME",
             value_from=client.V1EnvVarSource(
                 config_map_key_ref=client.V1ConfigMapKeySelector(
                     name="rag-config",
                     key="VECTOR_INDEX_NAME"
                 )
             )
         )
     ]
    

Testing the Integration

Run a request against the service after it is deployed. If you are testing locally through port-forwarding or an ingress route, hit /ask with a real insurance query.

import requests

response = requests.post(
    "http://localhost:8000/ask",
    json={"question": "Does Policy B cover flood damage?"}
)

print(response.status_code)
print(response.json())

Expected output:

200
{
  "answer": "...Policy B excludes flood damage unless rider X is active...",
  "sources": [
    "Policy A covers fire damage with a 30-day reporting window.",
    "Policy B excludes flood damage unless rider X is active."
  ]
}

Real-World Use Cases

  • Claims intake assistant

    • Route claimant questions through retrieval over policy documents, then run on Kubernetes so multiple adjusters can query it concurrently.
  • Underwriting policy checker

    • Use LangGraph to inspect applicant answers against underwriting rules and deploy it as a namespaced service with strict resource limits.
  • Broker support bot

    • Serve broker-facing RAG over coverage guides and endorsements inside your cluster so data access stays inside your network boundary.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides