How to Integrate LangGraph for lending with Kubernetes for RAG

By Cyprian AaronsUpdated 2026-04-21

langgraph-for-lendingkubernetesrag

Combining LangGraph for lending with Kubernetes gives you a clean way to run loan-origination and servicing workflows as stateful agent graphs, while keeping retrieval workloads isolated, scalable, and observable. In practice, that means your RAG layer can fetch policy docs, credit rules, and customer context from a Kubernetes-backed service without turning the lending workflow into a pile of ad hoc API calls.

The pattern is simple: LangGraph handles the decision flow, Kubernetes hosts the retrieval and supporting services. That split matters when you need auditability, retries, and predictable scaling under real banking traffic.

Prerequisites

•Python 3.10+
•
A running Kubernetes cluster
- •local: kind, minikube, or k3d
- •remote: EKS, GKE, AKS
•kubectl configured against your cluster
•Access to a document store or vector service used by your RAG layer
•LangGraph installed in your app environment
•Kubernetes Python client installed
•
Basic familiarity with:
- •graph-based agent workflows
- •deploying services to Kubernetes
- •REST or gRPC endpoints for retrieval

Install the core Python packages:

pip install langgraph kubernetes requests pydantic

Integration Steps

•Define the lending workflow in LangGraph

Start by modeling the lending process as a graph. The graph should call a retriever node for policy lookup before it makes any credit decision.

from typing import TypedDict, List
from langgraph.graph import StateGraph, END

class LendingState(TypedDict):
    applicant_id: str
    question: str
    retrieved_docs: List[str]
    decision: str

def retrieve_policy(state: LendingState) -> LendingState:
    # Placeholder for Kubernetes-hosted RAG service call
    return {
        **state,
        "retrieved_docs": ["Debt-to-income threshold is 43%", "Minimum employment history is 12 months"]
    }

def decide(state: LendingState) -> LendingState:
    docs = " ".join(state["retrieved_docs"])
    if "43%" in docs:
        decision = "approve_with_conditions"
    else:
        decision = "manual_review"
    return {**state, "decision": decision}

graph = StateGraph(LendingState)
graph.add_node("retrieve_policy", retrieve_policy)
graph.add_node("decide", decide)
graph.set_entry_point("retrieve_policy")
graph.add_edge("retrieve_policy", "decide")
graph.add_edge("decide", END)

app = graph.compile()

This gives you a deterministic workflow boundary. The retriever can change independently from the lending logic.

•Expose your RAG service inside Kubernetes

Run retrieval as a separate service in the cluster. Your LangGraph node should call this service over HTTP so you can scale it independently.

import requests

K8S_RAG_URL = "http://rag-service.default.svc.cluster.local/query"

def retrieve_from_k8s_rag(question: str) -> list[str]:
    resp = requests.post(
        K8S_RAG_URL,
        json={"query": question, "top_k": 3},
        timeout=5,
    )
    resp.raise_for_status()
    payload = resp.json()
    return payload["chunks"]

In production, this service usually fronts a vector database or search index. Keep the contract small: query text in, ranked chunks out.

•Wire the Kubernetes client into your deployment checks

Use the Kubernetes API to verify that the RAG backend is healthy before your agent starts handling lending traffic.

from kubernetes import client, config

def check_rag_service(namespace: str = "default", name: str = "rag-service") -> bool:
    config.load_incluster_config()
    v1 = client.CoreV1Api()
    svc = v1.read_namespaced_service(name=name, namespace=namespace)
    return svc.spec.cluster_ip is not None

if __name__ == "__main__":
    print(check_rag_service())

This is useful in startup probes or admin jobs. If the service is missing or misconfigured, fail fast instead of letting loan decisions run without retrieval.

•Call the Kubernetes-hosted retriever from LangGraph

Now replace the placeholder node with a real call to your cluster service.

from typing import TypedDict, List
from langgraph.graph import StateGraph, END

class LendingState(TypedDict):
    applicant_id: str
    question: str
    retrieved_docs: List[str]
    decision: str

def retrieve_policy(state: LendingState) -> LendingState:
    docs = retrieve_from_k8s_rag(state["question"])
    return {**state, "retrieved_docs": docs}

def decide(state: LendingState) -> LendingState:
    joined = " ".join(state["retrieved_docs"]).lower()
    if "employment history is 12 months" in joined and "43%" in joined:
        return {**state, "decision": "approve_with_conditions"}
    return {**state, "decision": "manual_review"}

graph = StateGraph(LendingState)
graph.add_node("retrieve_policy", retrieve_policy)
graph.add_node("decide", decide)
graph.set_entry_point("retrieve_policy")
graph.add_edge("retrieve_policy", "decide")
graph.add_edge("decide", END)

app = graph.compile()

result = app.invoke({
    "applicant_id": "A-10291",
    "question": "What are the underwriting rules for this loan?",
    "retrieved_docs": [],
    "decision": ""
})
print(result)

That’s the core integration. LangGraph owns state transitions; Kubernetes owns runtime placement and scaling of retrieval.

•Add deployment-time configuration for cluster access

Keep cluster details out of code and inject them through environment variables or ConfigMaps.

import os
import requests

RAG_SERVICE_URL = os.environ["RAG_SERVICE_URL"]

def retrieve_from_k8s_rag(question: str) -> list[str]:
    response = requests.post(
        RAG_SERVICE_URL,
        json={"query": question, "top_k": int(os.getenv("TOP_K", "3"))},
        timeout=5,
    )
    response.raise_for_status()
    return response.json()["chunks"]

This keeps local dev and cluster deployments aligned. In CI/CD, point RAG_SERVICE_URL at an internal service DNS name or ingress endpoint.

Testing the Integration

Run a smoke test that exercises both the graph and the retriever path.

test_state = {
    "applicant_id": "A-77881",
    "question": "Check lending policy for DTI and employment requirements",
    "retrieved_docs": [],
    "decision": ""
}

output = app.invoke(test_state)
print(output["decision"])
print(output["retrieved_docs"])

Expected output:

approve_with_conditions
['Debt-to-income threshold is 43%', 'Minimum employment history is 12 months']

If you get manual_review, check one of these first:

•The RAG service returned empty chunks
•The query route inside Kubernetes is wrong
•Your LangGraph node isn’t passing state through correctly

Real-World Use Cases

•
Underwriting assistant
- •Pulls policy snippets from a Kubernetes-hosted vector service before making approve/review decisions.
•
Loan ops copilot
- •Answers servicing questions from agents using document retrieval scoped to product type, region, or risk tier.
•
Compliance review workflow
- •Uses LangGraph to route cases through human approval when retrieved policy evidence is incomplete or conflicting.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit