How to Integrate LangGraph for lending with Kubernetes for production AI

By Cyprian AaronsUpdated 2026-04-21
langgraph-for-lendingkubernetesproduction-ai

Combining LangGraph for lending with Kubernetes gives you a clean path from agent logic to production runtime. LangGraph handles the lending workflow state machine; Kubernetes handles deployment, scaling, and isolation for the services that execute those workflows.

This is the setup you want when your lending agent needs to route applications, pull bureau data, trigger policy checks, and survive real traffic without turning into a single-node science project.

Prerequisites

  • Python 3.10+
  • Access to a Kubernetes cluster
    • Minikube, kind, EKS, GKE, or AKS
  • kubectl configured against your cluster
  • A container registry you can push to
  • LangGraph installed:
    • pip install langgraph langchain
  • Kubernetes Python client installed:
    • pip install kubernetes
  • A lending workflow design ready:
    • intake
    • document verification
    • risk scoring
    • decisioning
  • A namespace in Kubernetes for the agent services

Integration Steps

  1. Build the lending graph in LangGraph

Start with a simple state model for loan applications. In production, this state usually includes applicant data, document status, score outputs, and final decision.

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END

def merge_lists(left: list, right: list) -> list:
    return left + right

class LendingState(TypedDict):
    applicant_id: str
    income_verified: bool
    credit_score: int
    decision: str

def verify_income(state: LendingState):
    return {"income_verified": True}

def score_credit(state: LendingState):
    # Replace with bureau/API call in production
    return {"credit_score": 742}

def decide_loan(state: LendingState):
    if state["income_verified"] and state["credit_score"] >= 700:
        return {"decision": "approved"}
    return {"decision": "declined"}

graph = StateGraph(LendingState)
graph.add_node("verify_income", verify_income)
graph.add_node("score_credit", score_credit)
graph.add_node("decide_loan", decide_loan)

graph.set_entry_point("verify_income")
graph.add_edge("verify_income", "score_credit")
graph.add_edge("score_credit", "decide_loan")
graph.add_edge("decide_loan", END)

app = graph.compile()
  1. Wrap the graph in an API service

Kubernetes should run a service that exposes your graph execution over HTTP. FastAPI is a common choice because it is easy to containerize and probe.

from fastapi import FastAPI
from pydantic import BaseModel

class LoanRequest(BaseModel):
    applicant_id: str

api = FastAPI()

@api.post("/evaluate")
def evaluate_loan(req: LoanRequest):
    result = app.invoke(
        {
            "applicant_id": req.applicant_id,
            "income_verified": False,
            "credit_score": 0,
            "decision": ""
        }
    )
    return result
  1. Containerize the service for Kubernetes

The graph runs inside your container. Keep the image small and deterministic so rollout behavior is predictable.

# app.py
from fastapi import FastAPI
from pydantic import BaseModel

# assume graph code from above is imported as `app`

service = FastAPI()

class LoanRequest(BaseModel):
    applicant_id: str

@service.post("/evaluate")
def evaluate_loan(req: LoanRequest):
    return app.invoke({
        "applicant_id": req.applicant_id,
        "income_verified": False,
        "credit_score": 0,
        "decision": ""
    })

A minimal Dockerfile:

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
CMD ["uvicorn", "app:service", "--host", "0.0.0.0", "--port", "8000"]
  1. Deploy to Kubernetes using the Python client or manifests

If you want programmatic deployment from CI/CD or an operator workflow, use the Kubernetes Python client.

from kubernetes import client, config

config.load_kube_config()

apps_v1 = client.AppsV1Api()
core_v1 = client.CoreV1Api()

deployment = client.V1Deployment(
    metadata=client.V1ObjectMeta(name="lending-agent"),
    spec=client.V1DeploymentSpec(
        replicas=2,
        selector=client.V1LabelSelector(match_labels={"app": "lending-agent"}),
        template=client.V1PodTemplateSpec(
            metadata=client.V1ObjectMeta(labels={"app": "lending-agent"}),
            spec=client.V1PodSpec(
                containers=[
                    client.V1Container(
                        name="lending-agent",
                        image="registry.example.com/lending-agent:1.0.0",
                        ports=[client.V1ContainerPort(container_port=8000)],
                    )
                ]
            ),
        ),
    ),
)

apps_v1.create_namespaced_deployment(namespace="ai-lending", body=deployment)

Create a service so other systems can call it:

service = client.V1Service(
    metadata=client.V1ObjectMeta(name="lending-agent"),
    spec=client.V1ServiceSpec(
        selector={"app": "lending-agent"},
        ports=[client.V1ServicePort(port=80, target_port=8000)],
        type="ClusterIP",
    ),
)

core_v1.create_namespaced_service(namespace="ai-lending", body=service)
  1. Add autoscaling and operational controls

For lending workloads, traffic spikes happen during campaigns or batch underwriting windows. Use HPA so your graph service scales on CPU or custom metrics.

from kubernetes import client

autoscaling_v2 = client.AutoscalingV2Api()

hpa = client.V2HorizontalPodAutoscaler(
    metadata=client.V1ObjectMeta(name="lending-agent-hpa"),
    spec=client.V2HorizontalPodAutoscalerSpec(
        scale_target_ref=client.V2CrossVersionObjectReference(
            api_version="apps/v1",
            kind="Deployment",
            name="lending-agent",
        ),
        min_replicas=2,
        max_replicas=10,
        metrics=[
            client.V2MetricSpec(
                type="Resource",
                resource=client.V2ResourceMetricSource(
                    name="cpu",
                    target=client.V2MetricTarget(type="Utilization", average_utilization=70),
                ),
            )
        ],
    ),
)

autoscaling_v2.create_namespaced_horizontal_pod_autoscaler(
    namespace="ai-lending",
    body=hpa,
)

Testing the Integration

Run a quick smoke test against the deployed service.

import requests

resp = requests.post(
    "http://lending-agent.ai-lending.svc.cluster.local/evaluate",
    json={"applicant_id": "A12345"},
    timeout=10,
)

print(resp.status_code)
print(resp.json())

Expected output:

200
{'applicant_id': 'A12345', 'income_verified': True, 'credit_score': 742, 'decision': 'approved'}

If that returns correctly, you know three things are working:

  • LangGraph is executing the lending workflow
  • The API wrapper is exposing it correctly
  • Kubernetes is running and routing traffic to the pods

Real-World Use Cases

  • Loan origination triage
    • Route applications through verification, scoring, and policy checks before sending them to underwriters.
  • Document-heavy underwriting agents
    • Run OCR extraction, fraud checks, and exception handling as separate graph nodes behind Kubernetes services.
  • Batch portfolio review
    • Scale out nightly re-evaluation of existing loans when rates change or risk models are updated.

The pattern here is straightforward: keep business logic in LangGraph and keep runtime concerns in Kubernetes. That separation makes lending agents easier to test, easier to scale, and much less painful to operate under production load.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides