How to Integrate LangGraph for retail banking with Kubernetes for RAG
Combining LangGraph for retail banking with Kubernetes gives you a clean way to run regulated, stateful AI workflows at scale. LangGraph handles the agent logic and branching for banking use cases like customer support, KYC checks, and transaction lookups, while Kubernetes gives you the deployment, isolation, and autoscaling needed for production RAG systems.
Prerequisites
- •Python 3.10+
- •A Kubernetes cluster:
- •local:
kind,minikube, ork3d - •cloud: EKS, GKE, or AKS
- •local:
- •
kubectlconfigured against your cluster - •Access to a vector store for RAG:
- •pgvector, Pinecone, Weaviate, or OpenSearch
- •LangGraph installed:
- •
langgraph - •
langchain - •provider SDKs for your model
- •
- •Kubernetes Python client:
- •
kubernetes
- •
- •A bank-safe document corpus:
- •product FAQs
- •policy docs
- •account servicing guides
- •Secrets configured in Kubernetes:
- •model API keys
- •vector DB credentials
Install the core packages:
pip install langgraph langchain kubernetes pydantic openai
Integration Steps
- •Build the LangGraph workflow for the retail banking RAG agent.
This graph should route user questions into retrieval first, then answer generation. For banking, keep the state explicit so you can audit what was retrieved and what was answered.
from typing import TypedDict, List
from langgraph.graph import StateGraph, END
class BankRAGState(TypedDict):
question: str
retrieved_docs: List[str]
answer: str
def retrieve(state: BankRAGState):
# Replace with real retriever call
docs = [
"Retail checking accounts support instant debit card freeze.",
"Savings accounts allow up to six transfers per month."
]
return {"retrieved_docs": docs}
def generate(state: BankRAGState):
context = "\n".join(state["retrieved_docs"])
answer = f"Based on bank policy:\n{context}\n\nAnswer: {state['question']}"
return {"answer": answer}
graph = StateGraph(BankRAGState)
graph.add_node("retrieve", retrieve)
graph.add_node("generate", generate)
graph.set_entry_point("retrieve")
graph.add_edge("retrieve", "generate")
graph.add_edge("generate", END)
app = graph.compile()
- •Connect LangGraph to your actual retriever and model.
For production RAG, swap the mock retrieval with your vector store client. If you are using a bank knowledge base indexed in pgvector or OpenSearch, keep the query scoped to approved documents only.
from langchain_openai import ChatOpenAI
from langchain_core.documents import Document
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
def retrieve(state: BankRAGState):
query = state["question"]
# Example placeholder for your vector search call
docs = [
Document(page_content="Wire transfers over $10k require additional verification."),
Document(page_content="Retail customers can reset online banking passwords via MFA.")
]
return {"retrieved_docs": [d.page_content for d in docs]}
def generate(state: BankRAGState):
prompt = f"""
You are a retail banking assistant.
Use only the following context:
{chr(10).join(state["retrieved_docs"])}
Question: {state["question"]}
"""
response = llm.invoke(prompt)
return {"answer": response.content}
- •Package the graph into a service that Kubernetes can run.
Expose the graph behind FastAPI so it can be deployed as a container. This is where Kubernetes becomes useful: you can scale replicas independently from your model gateway or retriever service.
from fastapi import FastAPI
from pydantic import BaseModel
class QueryRequest(BaseModel):
question: str
api = FastAPI()
@api.post("/rag")
def rag_endpoint(req: QueryRequest):
result = app.invoke({"question": req.question, "retrieved_docs": [], "answer": ""})
return result
# uvicorn main:api --host 0.0.0.0 --port 8000
- •Deploy the service to Kubernetes using the Python client.
If you want to automate deployment from CI/CD or an operator job, use the official Kubernetes client. The key methods here are client.AppsV1Api().create_namespaced_deployment() and client.CoreV1Api().create_namespaced_service().
from kubernetes import client, config
config.load_kube_config()
apps_v1 = client.AppsV1Api()
core_v1 = client.CoreV1Api()
deployment = client.V1Deployment(
metadata=client.V1ObjectMeta(name="bank-rag-agent"),
spec=client.V1DeploymentSpec(
replicas=2,
selector=client.V1LabelSelector(match_labels={"app": "bank-rag-agent"}),
template=client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(labels={"app": "bank-rag-agent"}),
spec=client.V1PodSpec(containers=[
client.V1Container(
name="agent",
image="your-registry/bank-rag-agent:latest",
ports=[client.V1ContainerPort(container_port=8000)],
)
])
)
)
)
apps_v1.create_namespaced_deployment(namespace="banking-ai", body=deployment)
- •Add a Service so other internal systems can call the agent.
Your CRM, contact center, or middleware layer should hit a stable Kubernetes Service instead of talking directly to pods.
service = client.V1Service(
metadata=client.V1ObjectMeta(name="bank-rag-agent-svc"),
spec=client.V1ServiceSpec(
selector={"app": "bank-rag-agent"},
ports=[client.V1ServicePort(port=80, target_port=8000)],
type="ClusterIP"
)
)
core_v1.create_namespaced_service(namespace="banking-ai", body=service)
Testing the Integration
Run a direct graph invocation first, then verify the service is reachable inside the cluster.
test_input = {
"question": "How do I freeze my debit card?",
"retrieved_docs": [],
"answer": ""
}
result = app.invoke(test_input)
print(result["answer"])
Expected output:
Based on bank policy:
Retail customers can reset online banking passwords via MFA.
Wire transfers over $10k require additional verification.
Answer: How do I freeze my debit card?
For the Kubernetes side, confirm resources exist:
kubectl get deploy bank-rag-agent -n banking-ai
kubectl get svc bank-rag-agent-svc -n banking-ai
Real-World Use Cases
- •
Retail banking customer support
- •Answer questions about overdrafts, debit card freezes, transfer limits, and account access using approved policy documents.
- •
Branch advisor copilot
- •Let relationship managers query product rules and eligibility criteria during customer conversations without leaving their workflow.
- •
Operations triage
- •Route internal requests like “why was this wire held?” through a LangGraph flow that retrieves policy context before generating an explanation.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit