How to Build a compliance checking Agent Using LangGraph in Python for retail banking

By Cyprian AaronsUpdated 2026-04-21

compliance-checkinglanggraphpythonretail-banking

A compliance checking agent for retail banking reviews customer-facing actions, messages, and case notes against policy before they go out or get approved. It matters because a bad recommendation, an unapproved disclosure, or a data-handling mistake can create regulatory exposure, customer harm, and audit pain.

Architecture

•
Input normalizer
- •Takes raw text from chat, email drafts, call summaries, or CRM notes.
- •Strips noise and standardizes fields like jurisdiction, product_type, and customer_segment.
•
Policy retrieval layer
- •Pulls the relevant controls from a policy store or knowledge base.
- •Keeps checks scoped to the right region and product line, which matters for retail banking where rules differ by country and product.
•
Compliance evaluator
- •Uses an LLM or rules engine to compare the input against policy.
- •Produces structured findings like pass, needs_review, or block.
•
Decision router
- •Routes the workflow based on severity.
- •Low-risk items can proceed; high-risk items get escalated to a human reviewer.
•
Audit logger
- •Stores every decision, prompt version, policy version, and output.
- •This is non-negotiable for bank auditability and model governance.
•
Human review handoff
- •Packages the flagged item with reasons and evidence.
- •Lets compliance teams override or approve with traceability.

Implementation

•Define the state and build the graph

Use a typed state object so every node reads and writes predictable fields. In LangGraph, StateGraph is the right primitive for this pattern.

from typing import TypedDict, Literal, List
from langgraph.graph import StateGraph, START, END

class ComplianceState(TypedDict):
    input_text: str
    jurisdiction: str
    product_type: str
    policy_context: str
    finding: Literal["pass", "needs_review", "block"]
    rationale: str
    audit_log: List[str]

def normalize_input(state: ComplianceState) -> ComplianceState:
    return {
        **state,
        "input_text": state["input_text"].strip(),
        "audit_log": state.get("audit_log", []) + ["normalized_input"],
    }

def load_policy(state: ComplianceState) -> ComplianceState:
    # Replace with real retrieval from policy DB / vector store / document service
    policy_context = (
        f"Retail banking policy for {state['jurisdiction']} "
        f"and product {state['product_type']}: no misleading claims, "
        f"no promises of approval, no sharing sensitive data."
    )
    return {
        **state,
        "policy_context": policy_context,
        "audit_log": state["audit_log"] + ["loaded_policy"],
    }

def evaluate_compliance(state: ComplianceState) -> ComplianceState:
    text = state["input_text"].lower()
    if "guaranteed approval" in text or "share your password" in text:
        finding = "block"
        rationale = "Contains prohibited claim or unsafe request."
    elif "maybe" in text or "subject to review" in text:
        finding = "needs_review"
        rationale = "Potentially ambiguous language requires human review."
    else:
        finding = "pass"
        rationale = "No obvious compliance issues detected."
    return {
        **state,
        "finding": finding,
        "rationale": rationale,
        "audit_log": state["audit_log"] + [f"evaluated:{finding}"],
    }

def route_decision(state: ComplianceState) -> str:
    if state["finding"] == "block":
        return "human_review"
    if state["finding"] == "needs_review":
        return "human_review"
    return END

def human_review_stub(state: ComplianceState) -> ComplianceState:
    return {
        **state,
        "audit_log": state["audit_log"] + ["routed_to_human_review"],
    }

graph = StateGraph(ComplianceState)
graph.add_node("normalize_input", normalize_input)
graph.add_node("load_policy", load_policy)
graph.add_node("evaluate_compliance", evaluate_compliance)
graph.add_node("human_review", human_review_stub)

graph.add_edge(START, "normalize_input")
graph.add_edge("normalize_input", "load_policy")
graph.add_edge("load_policy", "evaluate_compliance")
graph.add_conditional_edges("evaluate_compliance", route_decision)

app = graph.compile()

•Add a real decision path

The important part is not just “LLM says yes/no.” You want explicit routing based on risk. In retail banking, blocked content should never silently continue downstream.

result = app.invoke(
    {
        "input_text": "We can guarantee approval if you apply today.",
        "jurisdiction": "UK",
        "product_type": "personal_loan",
        "policy_context": "",
        "finding": "",
        "rationale": "",
        "audit_log": [],
    }
)

print(result["finding"])
print(result["rationale"])
print(result["audit_log"])

This gives you a deterministic workflow:

•normalize input
•fetch relevant policy context
•evaluate against controls
•route to human review when needed

If you need richer branching, add more nodes for jurisdiction-specific checks like affordability language, fee disclosure checks, or PII leakage detection.

•Replace the stub evaluator with an LLM-backed structured check

For production use, keep the node interface the same but swap the internals for a structured model call. The key is that LangGraph manages orchestration; your node handles validation and formatting.

from pydantic import BaseModel
from langchain_openai import ChatOpenAI

class ComplianceFinding(BaseModel):
    finding: Literal["pass", "needs_review", "block"]
    rationale: str

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def llm_evaluate_compliance(state: ComplianceState) -> ComplianceState:
    prompt = f"""
You are checking retail banking content for compliance.

Jurisdiction: {state['jurisdiction']}
Product: {state['product_type']}
Policy:
{state['policy_context']}

Content:
{state['input_text']}

Return only a compliance judgment.
"""
    structured_llm = llm.with_structured_output(ComplianceFinding)
    verdict = structured_llm.invoke(prompt)

    return {
        **state,
        "finding": verdict.finding,
        "rationale": verdict.rationale,
        "audit_log": state["audit_log"] + [f"llm_evaluated:{verdict.finding}"],
    }

If you plug this into the graph instead of the rule-based evaluator, keep fallback rules around critical red flags like fraud instructions, password collection, sanctions terms, or prohibited guarantees.

•Persist audit trails outside the graph

The graph should emit an auditable result; your application should write it to durable storage. For banks, keep prompt version, policy version, model name, timestamp, user ID, case ID, and final disposition.

Production Considerations

•
Data residency
- •Keep customer data in-region.
- •If your bank operates across EU/UK/APAC boundaries, route requests to region-specific deployments and avoid sending raw PII to cross-border model endpoints.
•
Auditability
- •Store every run with immutable logs.
- •Capture input_hash, policy_version, model_version, finding, rationale, and reviewer override status.
•
Guardrails
- •Add hard blocks before any LLM call for obvious violations like credential requests or sanctions-related terms.
- •Use allowlists for approved response templates in customer-facing workflows.
•
Monitoring
- •Track false positives by product line and jurisdiction.
- •Alert on spikes in needs_review volume because that usually means a policy drift issue or a bad prompt change.

Common Pitfalls

•
Using one global policy prompt for all jurisdictions
- •Retail banking compliance is not one-size-fits-all.
- •Split policies by country and product so UK mortgage language does not get evaluated with US deposit account rules.
•
Letting the LLM make final decisions without deterministic gates
- •Don’t rely on model judgment alone for high-risk content.
- •Put rule-based blockers ahead of the model for things like credential collection, guaranteed returns, or explicit regulatory violations.
•
Skipping audit metadata
- •If you cannot prove what was checked and why it passed or failed, you do not have a compliance system.
- •Log every graph run with versions of prompts, policies, models, and human overrides.
•
Ignoring human review workflow
- •Some cases will always be ambiguous.
- •Build a clean handoff from LangGraph into your case management system so reviewers see the exact text that triggered escalation plus the reason code.

Keep learning

•The complete AI Agents Roadmap — my full 8-step breakdown
•Free: The AI Agent Starter Kit — PDF checklist + starter code
•Work with me — I build AI for banks and insurance companies

By Cyprian Aarons, AI Consultant at Topiax.

ShareX / Twitter LinkedIn

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit