How to Integrate LangGraph for retail banking with Kubernetes for production AI

By Cyprian AaronsUpdated 2026-04-21
langgraph-for-retail-bankingkubernetesproduction-ai

Retail banking agents need two things that usually fight each other: business logic you can audit, and infrastructure you can scale. LangGraph gives you the stateful orchestration layer for banking workflows like KYC checks, fraud triage, and loan pre-screening, while Kubernetes gives you the deployment, isolation, rollout, and autoscaling model you need to run those workflows in production.

Prerequisites

  • Python 3.10+
  • A Kubernetes cluster with kubectl access
  • pip or uv
  • Access to a LangGraph app or graph definition
  • A container registry for your image
  • Basic familiarity with:
    • langgraph.graph.StateGraph
    • Kubernetes Deployments and Services
    • environment variables for secrets/config
  • Installed Python packages:
    • langgraph
    • kubernetes
    • fastapi
    • uvicorn

Install the core dependencies:

pip install langgraph kubernetes fastapi uvicorn

Integration Steps

1) Define the retail banking workflow as a LangGraph state machine

Start with a graph that models a common banking flow: collect customer request, enrich with account data, run policy checks, then route to approval or manual review.

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END

def merge_lists(left: list, right: list) -> list:
    return left + right

class BankingState(TypedDict):
    customer_id: str
    request_type: str
    risk_flags: Annotated[list[str], merge_lists]
    decision: str

def fetch_customer_profile(state: BankingState):
    # Replace with real core banking / CRM lookup
    return {"risk_flags": ["kyc_verified"]}

def assess_policy(state: BankingState):
    flags = state.get("risk_flags", [])
    if "kyc_verified" in flags:
        return {"decision": "approve"}
    return {"decision": "manual_review"}

graph = StateGraph(BankingState)
graph.add_node("fetch_customer_profile", fetch_customer_profile)
graph.add_node("assess_policy", assess_policy)

graph.add_edge(START, "fetch_customer_profile")
graph.add_edge("fetch_customer_profile", "assess_policy")
graph.add_edge("assess_policy", END)

app = graph.compile()

This is the part you want under version control. The graph is your business process; Kubernetes will handle how it runs.

2) Wrap the graph in an API service for Kubernetes to manage

Kubernetes works best when your agent runtime is exposed as a stateless service. Use FastAPI so your cluster can health-check and route traffic to it.

from fastapi import FastAPI
from pydantic import BaseModel

app_api = FastAPI()

class BankRequest(BaseModel):
    customer_id: str
    request_type: str

@app_api.post("/invoke")
def invoke_graph(payload: BankRequest):
    result = app.invoke({
        "customer_id": payload.customer_id,
        "request_type": payload.request_type,
        "risk_flags": [],
        "decision": ""
    })
    return result

@app_api.get("/healthz")
def healthz():
    return {"status": "ok"}

Run it locally first:

uvicorn main:app_api --host 0.0.0.0 --port 8000

For production AI systems in retail banking, this API boundary matters. It gives you a clean place for auth, rate limiting, audit logging, and request validation before the graph executes.

3) Build a container image and deploy it to Kubernetes

Package the service into an image so Kubernetes can schedule replicas consistently.

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY main.py .

EXPOSE 8000
CMD ["uvicorn", "main:app_api", "--host", "0.0.0.0", "--port", "8000"]

Example requirements.txt:

langgraph
kubernetes
fastapi
uvicorn
pydantic

Now create a Deployment and Service. This is standard Kubernetes API usage; your app becomes horizontally scalable.

from kubernetes import client, config

config.load_kube_config()

apps_v1 = client.AppsV1Api()
core_v1 = client.CoreV1Api()

deployment = client.V1Deployment(
    metadata=client.V1ObjectMeta(name="retail-banking-agent"),
    spec=client.V1DeploymentSpec(
        replicas=3,
        selector=client.V1LabelSelector(
            match_labels={"app": "retail-banking-agent"}
        ),
        template=client.V1PodTemplateSpec(
            metadata=client.V1ObjectMeta(labels={"app": "retail-banking-agent"}),
            spec=client.V1PodSpec(
                containers=[
                    client.V1Container(
                        name="agent",
                        image="registry.example.com/retail-banking-agent:v1",
                        ports=[client.V1ContainerPort(container_port=8000)],
                    )
                ]
            ),
        ),
    ),
)

apps_v1.create_namespaced_deployment(namespace="banking-ai", body=deployment)

And the Service:

service = client.V1Service(
    metadata=client.V1ObjectMeta(name="retail-banking-agent"),
    spec=client.V1ServiceSpec(
        selector={"app": "retail-banking-agent"},
        ports=[client.V1ServicePort(port=80, target_port=8000)],
        type="ClusterIP",
    ),
)

core_v1.create_namespaced_service(namespace="banking-ai", body=service)

4) Add config and secrets for bank-grade runtime controls

Don’t hardcode model keys, policy thresholds, or downstream endpoints. Use ConfigMaps and Secrets so Kubernetes owns runtime configuration.

from kubernetes import client, config

config.load_kube_config()
core_v1 = client.CoreV1Api()

secret = client.V1Secret(
    metadata=client.V1ObjectMeta(name="agent-secrets"),
    string_data={
        "OPENAI_API_KEY": "replace-me",
        "CORE_BANKING_URL": "https://core-banking.internal"
    }
)

core_v1.create_namespaced_secret(namespace="banking-ai", body=secret)

Then mount them into the pod spec through environment variables:

env = [
    client.V1EnvVar(
        name="OPENAI_API_KEY",
        value_from=client.V1EnvVarSource(
            secret_key_ref=client.V1SecretKeySelector(name="agent-secrets", key="OPENAI_API_KEY")
        )
    ),
    client.V1EnvVar(
        name="CORE_BANKING_URL",
        value_from=client.V1EnvVarSource(
            secret_key_ref=client.V1SecretKeySelector(name="agent-secrets", key="CORE_BANKING_URL")
        )
    ),
]

That keeps operational settings separate from graph logic. In regulated environments, that separation is not optional.

5) Wire observability around graph execution

You need traceability for every decision path. At minimum, log inputs, outputs, latency, and node transitions.

import time
import logging

logging.basicConfig(level=logging.INFO)

def timed_invoke(payload):
    start = time.time()
    result = app.invoke(payload)
    elapsed_ms = round((time.time() - start) * 1000)
    logging.info("banking_graph_invoked customer_id=%s elapsed_ms=%s decision=%s",
                 payload["customer_id"], elapsed_ms, result["decision"])
    return result

In Kubernetes, this becomes useful when paired with pod logs and metrics scraping. You can trace whether delays come from LangGraph execution or from upstream services.

Testing the Integration

Use either a direct Python call or hit the API endpoint once the pod is running.

result = app.invoke({
    "customer_id": "CUST-10021",
    "request_type": "loan_precheck",
    "risk_flags": [],
    "decision": ""
})
print(result)

Expected output:

{'customer_id': 'CUST-10021', 'request_type': 'loan_precheck', 'risk_flags': ['kyc_verified'], 'decision': 'approve'}

If you’re testing the Kubernetes service from inside the cluster:

curl -X POST http://retail-banking-agent.banking-ai.svc.cluster.local/invoke \
  -H 'Content-Type: application/json' \
  -d '{"customer_id":"CUST-10021","request_type":"loan_precheck"}'

Real-World Use Cases

  • Retail loan pre-screening

    • Run eligibility checks through LangGraph nodes.
    • Scale inference pods on demand with Kubernetes during peak application windows.
  • Fraud triage assistant

    • Route suspicious transactions through policy steps before escalating to human review.
    • Use Kubernetes replicas to isolate workloads by region or product line.
  • KYC remediation workflow

    • Orchestrate document collection, verification calls, and exception handling in LangGraph.
    • Roll out updated compliance rules safely with Kubernetes deployments and canary releases.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides