How to Build a KYC verification Agent Using LangChain in Python for banking

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationlangchainpythonbanking

A KYC verification agent checks customer identity data, flags missing or inconsistent documents, and routes risky cases for human review. In banking, that matters because onboarding speed is useless if you cannot prove compliance, maintain an audit trail, and keep suspicious accounts out of the system.

Architecture

  • Input intake layer

    • Accepts structured customer data: name, DOB, address, ID number, country, document metadata.
    • Normalizes fields before the agent sees them.
  • Policy and rules layer

    • Encodes bank-specific KYC rules: required documents by jurisdiction, PEP/sanctions escalation triggers, age thresholds, residency constraints.
    • This should be deterministic, not left to the model.
  • LangChain agent

    • Uses an LLM to reason over incomplete or conflicting records.
    • Calls tools for validation instead of inventing answers.
  • Verification tools

    • Document completeness checker
    • Sanctions/PEP lookup tool
    • Address normalization / country code validation
    • Internal customer master lookup
  • Decision and audit layer

    • Produces one of: approve, reject, manual_review.
    • Stores the reasoning trace, tool outputs, timestamps, and policy version for audit.

Implementation

1) Define the verification schema and tools

Keep the agent constrained. Use Pydantic models for input/output and expose only the tools it needs.

from typing import Literal
from pydantic import BaseModel, Field

from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langchain.agents import create_react_agent, AgentExecutor
from langchain_core.prompts import PromptTemplate


class KYCInput(BaseModel):
    full_name: str
    date_of_birth: str
    country_of_residence: str
    id_type: str
    id_number: str | None = None
    address: str | None = None


class KYCDecision(BaseModel):
    decision: Literal["approve", "reject", "manual_review"]
    reason: str
    missing_fields: list[str] = Field(default_factory=list)


@tool
def check_required_fields(full_name: str, date_of_birth: str, country_of_residence: str,
                          id_type: str, id_number: str | None = None,
                          address: str | None = None) -> dict:
    missing = []
    if not full_name:
        missing.append("full_name")
    if not date_of_birth:
        missing.append("date_of_birth")
    if not country_of_residence:
        missing.append("country_of_residence")
    if not id_type:
        missing.append("id_type")
    if id_type and not id_number:
        missing.append("id_number")
    if country_of_residence in {"US", "GB", "DE"} and not address:
        missing.append("address")
    return {"missing_fields": missing}


@tool
def sanctions_screen(full_name: str) -> dict:
    # Replace with real screening API call.
    hit = full_name.lower() in {"john doe", "test sanction"}
    return {"sanctions_hit": hit}


tools = [check_required_fields, sanctions_screen]

2) Build a prompt that forces policy-driven behavior

Do not ask the model to “decide freely.” Tell it exactly how to use tools and when to escalate.

prompt = PromptTemplate.from_template("""
You are a KYC verification agent for a bank.

Rules:
- Never approve a case with missing mandatory fields.
- If sanctions screening returns a hit, return manual_review or reject based on policy context.
- Do not infer missing identity data.
- Use only tool outputs and provided input.
- Return a concise final answer with decision, reason, and missing fields.

Customer record:
{input}

Available tools:
{tools}

Use this format:
Thought: ...
Action: ...
Action Input: ...
Observation: ...
Final: ...
""")

3) Create the LangChain agent and execute it

This is the actual pattern you want in production-style code. The agent reasons over the record, calls tools, then returns a structured decision.

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

agent = create_react_agent(
    llm=llm,
    tools=tools,
    prompt=prompt,
)

executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    handle_parsing_errors=True,
)

customer = {
    "full_name": "John Doe",
    "date_of_birth": "1988-04-21",
    "country_of_residence": "US",
    "id_type": "passport",
}

result = executor.invoke({"input": customer})
print(result["output"])

4) Parse the result into an auditable decision object

In banking you need a stable output contract. Convert the model output into your internal decision schema before writing anything downstream.

import json

raw_output = result["output"]

# In production, use a stricter parser or structured output strategy.
decision_payload = {
    "decision": "manual_review" if "manual_review" in raw_output else "approve",
    "reason": raw_output,
    "missing_fields": ["id_number"] if "missing" in raw_output.lower() else []
}

kyc_decision = KYCDecision(**decision_payload)
print(kyc_decision.model_dump())

If you want stronger guarantees than text parsing, move to structured output with with_structured_output() on the chat model and keep the same decision schema. That reduces downstream ambiguity when compliance teams review cases.

Production Considerations

  • Auditability

    • Persist every tool call, input payload hash, model version, prompt version, and final decision.
    • Regulators will care more about reproducibility than model cleverness.
  • Data residency

    • Keep PII inside approved regions.
    • If your LLM endpoint crosses borders unintentionally, your KYC pipeline may violate local banking rules even if the logic is correct.
  • Guardrails

    • Hard-block approval when mandatory fields are absent.
    • Add deterministic checks before the LLM runs; do not let the model override policy.
  • Monitoring

    • Track manual review rate, sanctions hit rate, false positive rate, latency per case, and tool failure rate.
    • A sudden drop in manual reviews usually means your guardrails broke.

Common Pitfalls

  1. Letting the LLM make policy decisions

    • Mistake: asking the model to decide approve/reject from scratch.
    • Fix: encode mandatory rules in code first; let LangChain handle reasoning only where judgment is needed.
  2. Using free-form text as system output

    • Mistake: storing raw model prose as the final decision object.
    • Fix: map output into a strict schema like KYCDecision before persistence or workflow routing.
  3. Ignoring jurisdiction-specific requirements

    • Mistake: applying one global KYC checklist to every customer.
    • Fix: parameterize rules by country and product line. A retail deposit account in one jurisdiction does not have the same evidence requirements as business onboarding in another.
  4. Skipping human review escalation

    • Mistake: auto-approving borderline cases because the model sounds confident.
    • Fix: route any sanctions ambiguity, document mismatch, or high-risk geography to manual review with full traceability.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides