How to Integrate LangGraph for banking with Kubernetes for RAG
Combining LangGraph for banking with Kubernetes gives you a clean way to run regulated, stateful RAG workflows at scale. LangGraph handles the agent orchestration and decision graph; Kubernetes gives you the deployment, isolation, and autoscaling you need when the retrieval layer starts serving real traffic.
This setup is useful when your banking assistant needs to answer policy questions, summarize account documents, or route fraud-related queries through controlled steps. You get deterministic workflow control from LangGraph and operational reliability from Kubernetes.
Prerequisites
- •Python 3.10+
- •A Kubernetes cluster with
kubectlconfigured - •Access to a vector store or document index used by your RAG layer
- •LangGraph installed:
- •
pip install langgraph langchain langchain-openai
- •
- •Kubernetes Python client installed:
- •
pip install kubernetes
- •
- •A working LLM provider key set in environment variables
- •Bank data access already approved through your internal security controls
- •A container registry for pushing the agent image
Integration Steps
- •Build the LangGraph workflow for banking RAG
Start by defining a graph that retrieves bank policy documents, formats context, and generates a response. Use StateGraph for explicit control over each step.
from typing import TypedDict, List
from langgraph.graph import StateGraph, END
from langchain_core.documents import Document
class BankingRAGState(TypedDict):
question: str
docs: List[Document]
answer: str
def retrieve_docs(state: BankingRAGState):
# Replace with your real retriever
docs = [
Document(page_content="KYC escalation requires manual review for mismatched identity fields."),
Document(page_content="Loan eligibility depends on income verification and credit policy.")
]
return {"docs": docs}
def generate_answer(state: BankingRAGState):
context = "\n".join(doc.page_content for doc in state["docs"])
answer = f"Question: {state['question']}\nContext:\n{context}\nAnswer: Follow bank policy and escalate if needed."
return {"answer": answer}
graph = StateGraph(BankingRAGState)
graph.add_node("retrieve_docs", retrieve_docs)
graph.add_node("generate_answer", generate_answer)
graph.set_entry_point("retrieve_docs")
graph.add_edge("retrieve_docs", "generate_answer")
graph.add_edge("generate_answer", END)
app = graph.compile()
- •Wrap the graph in an API service for Kubernetes
Expose the graph through FastAPI so Kubernetes can manage it as a stateless service. The service receives a query, invokes the LangGraph app, and returns the result.
from fastapi import FastAPI
from pydantic import BaseModel
app_api = FastAPI()
class QueryRequest(BaseModel):
question: str
@app_api.post("/rag")
def rag_endpoint(req: QueryRequest):
result = app.invoke({"question": req.question, "docs": [], "answer": ""})
return {"answer": result["answer"]}
- •Containerize the service
Package the API into an image that Kubernetes can run. Keep the image minimal and pin dependencies.
# main.py
import uvicorn
if __name__ == "__main__":
uvicorn.run(app_api, host="0.0.0.0", port=8000)
Example Dockerfile:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["python", "main.py"]
- •Deploy to Kubernetes using the Python client
Use the Kubernetes Python client if you want deployment automation from your CI pipeline or admin tooling.
from kubernetes import client, config
config.load_kube_config()
apps_v1 = client.AppsV1Api()
core_v1 = client.CoreV1Api()
deployment = client.V1Deployment(
metadata=client.V1ObjectMeta(name="banking-rag-agent"),
spec=client.V1DeploymentSpec(
replicas=2,
selector=client.V1LabelSelector(match_labels={"app": "banking-rag-agent"}),
template=client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(labels={"app": "banking-rag-agent"}),
spec=client.V1PodSpec(containers=[
client.V1Container(
name="agent",
image="your-registry/banking-rag-agent:latest",
ports=[client.V1ContainerPort(container_port=8000)]
)
])
)
)
)
apps_v1.create_namespaced_deployment(namespace="default", body=deployment)
Then expose it with a service:
service = client.V1Service(
metadata=client.V1ObjectMeta(name="banking-rag-agent-svc"),
spec=client.V1ServiceSpec(
selector={"app": "banking-rag-agent"},
ports=[client.V1ServicePort(port=80, target_port=8000)]
)
)
core_v1.create_namespaced_service(namespace="default", body=service)
- •Connect LangGraph execution to cluster-hosted retrieval
In production, your retrieve node should call a retriever backed by infrastructure running in Kubernetes, such as a vector DB service or internal document API.
import requests
def retrieve_docs(state: BankingRAGState):
resp = requests.get(
"http://vector-store.default.svc.cluster.local/search",
params={"q": state["question"], "top_k": 3},
timeout=5,
)
payload = resp.json()
docs = [Document(page_content=item["text"]) for item in payload["results"]]
return {"docs": docs}
That keeps orchestration in LangGraph while retrieval stays inside your cluster boundary.
Testing the Integration
Run a local request against the API once the pod is up:
import requests
response = requests.post(
"http://localhost:8000/rag",
json={"question": "What happens when KYC fields do not match?"}
)
print(response.status_code)
print(response.json())
Expected output:
200
{'answer': 'Question: What happens when KYC fields do not match?\nContext:\nKYC escalation requires manual review for mismatched identity fields.\nLoan eligibility depends on income verification and credit policy.\nAnswer: Follow bank policy and escalate if needed.'}
If you want to verify from inside the cluster, use kubectl port-forward to reach the service and rerun the same request against localhost.
Real-World Use Cases
- •Policy Q&A for relationship managers
- •Answer questions about onboarding rules, lending policies, AML escalation paths, and product eligibility using internal documents.
- •Customer support triage
- •Classify incoming banking tickets, retrieve relevant knowledge base content, and route high-risk cases to human review.
- •Fraud and compliance assistants
- •Combine retrieval over case notes with multi-step agent logic to summarize evidence before escalation.
The pattern here is simple: LangGraph owns workflow control, Kubernetes owns runtime control. That separation keeps your RAG system maintainable when banking requirements move from prototype to production.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit