How to Integrate LangGraph for banking with Kubernetes for RAG

By Cyprian AaronsUpdated 2026-04-21
langgraph-for-bankingkubernetesrag

Combining LangGraph for banking with Kubernetes gives you a clean way to run regulated, stateful RAG workflows at scale. LangGraph handles the agent orchestration and decision graph; Kubernetes gives you the deployment, isolation, and autoscaling you need when the retrieval layer starts serving real traffic.

This setup is useful when your banking assistant needs to answer policy questions, summarize account documents, or route fraud-related queries through controlled steps. You get deterministic workflow control from LangGraph and operational reliability from Kubernetes.

Prerequisites

  • Python 3.10+
  • A Kubernetes cluster with kubectl configured
  • Access to a vector store or document index used by your RAG layer
  • LangGraph installed:
    • pip install langgraph langchain langchain-openai
  • Kubernetes Python client installed:
    • pip install kubernetes
  • A working LLM provider key set in environment variables
  • Bank data access already approved through your internal security controls
  • A container registry for pushing the agent image

Integration Steps

  1. Build the LangGraph workflow for banking RAG

Start by defining a graph that retrieves bank policy documents, formats context, and generates a response. Use StateGraph for explicit control over each step.

from typing import TypedDict, List
from langgraph.graph import StateGraph, END
from langchain_core.documents import Document

class BankingRAGState(TypedDict):
    question: str
    docs: List[Document]
    answer: str

def retrieve_docs(state: BankingRAGState):
    # Replace with your real retriever
    docs = [
        Document(page_content="KYC escalation requires manual review for mismatched identity fields."),
        Document(page_content="Loan eligibility depends on income verification and credit policy.")
    ]
    return {"docs": docs}

def generate_answer(state: BankingRAGState):
    context = "\n".join(doc.page_content for doc in state["docs"])
    answer = f"Question: {state['question']}\nContext:\n{context}\nAnswer: Follow bank policy and escalate if needed."
    return {"answer": answer}

graph = StateGraph(BankingRAGState)
graph.add_node("retrieve_docs", retrieve_docs)
graph.add_node("generate_answer", generate_answer)
graph.set_entry_point("retrieve_docs")
graph.add_edge("retrieve_docs", "generate_answer")
graph.add_edge("generate_answer", END)

app = graph.compile()
  1. Wrap the graph in an API service for Kubernetes

Expose the graph through FastAPI so Kubernetes can manage it as a stateless service. The service receives a query, invokes the LangGraph app, and returns the result.

from fastapi import FastAPI
from pydantic import BaseModel

app_api = FastAPI()

class QueryRequest(BaseModel):
    question: str

@app_api.post("/rag")
def rag_endpoint(req: QueryRequest):
    result = app.invoke({"question": req.question, "docs": [], "answer": ""})
    return {"answer": result["answer"]}
  1. Containerize the service

Package the API into an image that Kubernetes can run. Keep the image minimal and pin dependencies.

# main.py
import uvicorn

if __name__ == "__main__":
    uvicorn.run(app_api, host="0.0.0.0", port=8000)

Example Dockerfile:

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
EXPOSE 8000
CMD ["python", "main.py"]
  1. Deploy to Kubernetes using the Python client

Use the Kubernetes Python client if you want deployment automation from your CI pipeline or admin tooling.

from kubernetes import client, config

config.load_kube_config()

apps_v1 = client.AppsV1Api()
core_v1 = client.CoreV1Api()

deployment = client.V1Deployment(
    metadata=client.V1ObjectMeta(name="banking-rag-agent"),
    spec=client.V1DeploymentSpec(
        replicas=2,
        selector=client.V1LabelSelector(match_labels={"app": "banking-rag-agent"}),
        template=client.V1PodTemplateSpec(
            metadata=client.V1ObjectMeta(labels={"app": "banking-rag-agent"}),
            spec=client.V1PodSpec(containers=[
                client.V1Container(
                    name="agent",
                    image="your-registry/banking-rag-agent:latest",
                    ports=[client.V1ContainerPort(container_port=8000)]
                )
            ])
        )
    )
)

apps_v1.create_namespaced_deployment(namespace="default", body=deployment)

Then expose it with a service:

service = client.V1Service(
    metadata=client.V1ObjectMeta(name="banking-rag-agent-svc"),
    spec=client.V1ServiceSpec(
        selector={"app": "banking-rag-agent"},
        ports=[client.V1ServicePort(port=80, target_port=8000)]
    )
)

core_v1.create_namespaced_service(namespace="default", body=service)
  1. Connect LangGraph execution to cluster-hosted retrieval

In production, your retrieve node should call a retriever backed by infrastructure running in Kubernetes, such as a vector DB service or internal document API.

import requests

def retrieve_docs(state: BankingRAGState):
    resp = requests.get(
        "http://vector-store.default.svc.cluster.local/search",
        params={"q": state["question"], "top_k": 3},
        timeout=5,
    )
    payload = resp.json()
    docs = [Document(page_content=item["text"]) for item in payload["results"]]
    return {"docs": docs}

That keeps orchestration in LangGraph while retrieval stays inside your cluster boundary.

Testing the Integration

Run a local request against the API once the pod is up:

import requests

response = requests.post(
    "http://localhost:8000/rag",
    json={"question": "What happens when KYC fields do not match?"}
)

print(response.status_code)
print(response.json())

Expected output:

200
{'answer': 'Question: What happens when KYC fields do not match?\nContext:\nKYC escalation requires manual review for mismatched identity fields.\nLoan eligibility depends on income verification and credit policy.\nAnswer: Follow bank policy and escalate if needed.'}

If you want to verify from inside the cluster, use kubectl port-forward to reach the service and rerun the same request against localhost.

Real-World Use Cases

  • Policy Q&A for relationship managers
    • Answer questions about onboarding rules, lending policies, AML escalation paths, and product eligibility using internal documents.
  • Customer support triage
    • Classify incoming banking tickets, retrieve relevant knowledge base content, and route high-risk cases to human review.
  • Fraud and compliance assistants
    • Combine retrieval over case notes with multi-step agent logic to summarize evidence before escalation.

The pattern here is simple: LangGraph owns workflow control, Kubernetes owns runtime control. That separation keeps your RAG system maintainable when banking requirements move from prototype to production.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides