How to Integrate LangGraph for retail banking with Kubernetes for production AI
Retail banking agents need two things that usually fight each other: business logic you can audit, and infrastructure you can scale. LangGraph gives you the stateful orchestration layer for banking workflows like KYC checks, fraud triage, and loan pre-screening, while Kubernetes gives you the deployment, isolation, rollout, and autoscaling model you need to run those workflows in production.
Prerequisites
- •Python 3.10+
- •A Kubernetes cluster with
kubectlaccess - •
piporuv - •Access to a LangGraph app or graph definition
- •A container registry for your image
- •Basic familiarity with:
- •
langgraph.graph.StateGraph - •Kubernetes Deployments and Services
- •environment variables for secrets/config
- •
- •Installed Python packages:
- •
langgraph - •
kubernetes - •
fastapi - •
uvicorn
- •
Install the core dependencies:
pip install langgraph kubernetes fastapi uvicorn
Integration Steps
1) Define the retail banking workflow as a LangGraph state machine
Start with a graph that models a common banking flow: collect customer request, enrich with account data, run policy checks, then route to approval or manual review.
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
def merge_lists(left: list, right: list) -> list:
return left + right
class BankingState(TypedDict):
customer_id: str
request_type: str
risk_flags: Annotated[list[str], merge_lists]
decision: str
def fetch_customer_profile(state: BankingState):
# Replace with real core banking / CRM lookup
return {"risk_flags": ["kyc_verified"]}
def assess_policy(state: BankingState):
flags = state.get("risk_flags", [])
if "kyc_verified" in flags:
return {"decision": "approve"}
return {"decision": "manual_review"}
graph = StateGraph(BankingState)
graph.add_node("fetch_customer_profile", fetch_customer_profile)
graph.add_node("assess_policy", assess_policy)
graph.add_edge(START, "fetch_customer_profile")
graph.add_edge("fetch_customer_profile", "assess_policy")
graph.add_edge("assess_policy", END)
app = graph.compile()
This is the part you want under version control. The graph is your business process; Kubernetes will handle how it runs.
2) Wrap the graph in an API service for Kubernetes to manage
Kubernetes works best when your agent runtime is exposed as a stateless service. Use FastAPI so your cluster can health-check and route traffic to it.
from fastapi import FastAPI
from pydantic import BaseModel
app_api = FastAPI()
class BankRequest(BaseModel):
customer_id: str
request_type: str
@app_api.post("/invoke")
def invoke_graph(payload: BankRequest):
result = app.invoke({
"customer_id": payload.customer_id,
"request_type": payload.request_type,
"risk_flags": [],
"decision": ""
})
return result
@app_api.get("/healthz")
def healthz():
return {"status": "ok"}
Run it locally first:
uvicorn main:app_api --host 0.0.0.0 --port 8000
For production AI systems in retail banking, this API boundary matters. It gives you a clean place for auth, rate limiting, audit logging, and request validation before the graph executes.
3) Build a container image and deploy it to Kubernetes
Package the service into an image so Kubernetes can schedule replicas consistently.
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY main.py .
EXPOSE 8000
CMD ["uvicorn", "main:app_api", "--host", "0.0.0.0", "--port", "8000"]
Example requirements.txt:
langgraph
kubernetes
fastapi
uvicorn
pydantic
Now create a Deployment and Service. This is standard Kubernetes API usage; your app becomes horizontally scalable.
from kubernetes import client, config
config.load_kube_config()
apps_v1 = client.AppsV1Api()
core_v1 = client.CoreV1Api()
deployment = client.V1Deployment(
metadata=client.V1ObjectMeta(name="retail-banking-agent"),
spec=client.V1DeploymentSpec(
replicas=3,
selector=client.V1LabelSelector(
match_labels={"app": "retail-banking-agent"}
),
template=client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(labels={"app": "retail-banking-agent"}),
spec=client.V1PodSpec(
containers=[
client.V1Container(
name="agent",
image="registry.example.com/retail-banking-agent:v1",
ports=[client.V1ContainerPort(container_port=8000)],
)
]
),
),
),
)
apps_v1.create_namespaced_deployment(namespace="banking-ai", body=deployment)
And the Service:
service = client.V1Service(
metadata=client.V1ObjectMeta(name="retail-banking-agent"),
spec=client.V1ServiceSpec(
selector={"app": "retail-banking-agent"},
ports=[client.V1ServicePort(port=80, target_port=8000)],
type="ClusterIP",
),
)
core_v1.create_namespaced_service(namespace="banking-ai", body=service)
4) Add config and secrets for bank-grade runtime controls
Don’t hardcode model keys, policy thresholds, or downstream endpoints. Use ConfigMaps and Secrets so Kubernetes owns runtime configuration.
from kubernetes import client, config
config.load_kube_config()
core_v1 = client.CoreV1Api()
secret = client.V1Secret(
metadata=client.V1ObjectMeta(name="agent-secrets"),
string_data={
"OPENAI_API_KEY": "replace-me",
"CORE_BANKING_URL": "https://core-banking.internal"
}
)
core_v1.create_namespaced_secret(namespace="banking-ai", body=secret)
Then mount them into the pod spec through environment variables:
env = [
client.V1EnvVar(
name="OPENAI_API_KEY",
value_from=client.V1EnvVarSource(
secret_key_ref=client.V1SecretKeySelector(name="agent-secrets", key="OPENAI_API_KEY")
)
),
client.V1EnvVar(
name="CORE_BANKING_URL",
value_from=client.V1EnvVarSource(
secret_key_ref=client.V1SecretKeySelector(name="agent-secrets", key="CORE_BANKING_URL")
)
),
]
That keeps operational settings separate from graph logic. In regulated environments, that separation is not optional.
5) Wire observability around graph execution
You need traceability for every decision path. At minimum, log inputs, outputs, latency, and node transitions.
import time
import logging
logging.basicConfig(level=logging.INFO)
def timed_invoke(payload):
start = time.time()
result = app.invoke(payload)
elapsed_ms = round((time.time() - start) * 1000)
logging.info("banking_graph_invoked customer_id=%s elapsed_ms=%s decision=%s",
payload["customer_id"], elapsed_ms, result["decision"])
return result
In Kubernetes, this becomes useful when paired with pod logs and metrics scraping. You can trace whether delays come from LangGraph execution or from upstream services.
Testing the Integration
Use either a direct Python call or hit the API endpoint once the pod is running.
result = app.invoke({
"customer_id": "CUST-10021",
"request_type": "loan_precheck",
"risk_flags": [],
"decision": ""
})
print(result)
Expected output:
{'customer_id': 'CUST-10021', 'request_type': 'loan_precheck', 'risk_flags': ['kyc_verified'], 'decision': 'approve'}
If you’re testing the Kubernetes service from inside the cluster:
curl -X POST http://retail-banking-agent.banking-ai.svc.cluster.local/invoke \
-H 'Content-Type: application/json' \
-d '{"customer_id":"CUST-10021","request_type":"loan_precheck"}'
Real-World Use Cases
- •
Retail loan pre-screening
- •Run eligibility checks through LangGraph nodes.
- •Scale inference pods on demand with Kubernetes during peak application windows.
- •
Fraud triage assistant
- •Route suspicious transactions through policy steps before escalating to human review.
- •Use Kubernetes replicas to isolate workloads by region or product line.
- •
KYC remediation workflow
- •Orchestrate document collection, verification calls, and exception handling in LangGraph.
- •Roll out updated compliance rules safely with Kubernetes deployments and canary releases.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit