How to Build a KYC verification Agent Using LangChain in Python for payments

By Cyprian AaronsUpdated 2026-04-21
kyc-verificationlangchainpythonpayments

A KYC verification agent for payments takes customer identity data, checks it against policy and external sources, and returns a decision: approve, reject, or escalate to a human. For payment flows, this matters because bad KYC leads to fraud exposure, regulatory issues, failed onboarding, and expensive manual reviews.

Architecture

  • Input normalization layer

    • Takes raw customer data from signup forms, CRM records, or uploaded documents.
    • Converts it into a consistent schema before any LLM call.
  • Policy engine

    • Encodes your KYC rules: required fields, country restrictions, PEP/sanctions escalation, document freshness.
    • Keeps deterministic checks outside the model.
  • LangChain reasoning layer

    • Uses an LLM through ChatOpenAI to classify risk signals from structured inputs and extracted text.
    • Produces a decision with explicit rationale.
  • Tooling layer

    • Wraps external systems like sanctions screening APIs, ID verification vendors, and internal customer databases.
    • Exposed to the agent through LangChain tools.
  • Audit log store

    • Persists every input, tool result, model output, and final decision.
    • Required for compliance review and dispute handling.
  • Human review queue

    • Captures borderline or high-risk cases.
    • Prevents the agent from making final calls on ambiguous payment customers.

Implementation

1) Define the KYC schema and deterministic policy checks

Start with a typed payload. Do not let the model infer structure from free text when you are dealing with payments onboarding.

from typing import Literal, Optional
from pydantic import BaseModel, Field

class KYCRequest(BaseModel):
    full_name: str
    date_of_birth: str
    country: str
    email: str
    government_id_type: Literal["passport", "national_id", "driver_license"]
    government_id_number: str
    pep_match: bool = False
    sanctions_match: bool = False
    document_expiry_date: Optional[str] = None

class KYCDecision(BaseModel):
    status: Literal["approve", "reject", "review"]
    risk_score: int = Field(ge=0, le=100)
    reasons: list[str]

Add basic policy gates before you call the model. This keeps obvious failures out of the LLM path.

def deterministic_checks(req: KYCRequest) -> list[str]:
    issues = []
    if req.sanctions_match:
        issues.append("Sanctions match present")
    if req.pep_match:
        issues.append("PEP match present")
    if req.country.upper() in {"IR", "KP", "SY"}:
        issues.append("Restricted jurisdiction")
    return issues

2) Build the LangChain chain with structured output

Use ChatOpenAI plus with_structured_output() so the model returns machine-readable decisions. This is the right pattern for compliance workflows because you want predictable output shapes.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are a KYC analyst for a payments company. "
     "Return conservative decisions. If there is any compliance ambiguity, choose review."),
    ("human",
     "Customer record:\n{customer_json}\n\n"
     "Deterministic issues:\n{issues}\n\n"
     "Assess KYC risk for payment onboarding.")
])

structured_llm = llm.with_structured_output(KYCDecision)
kyc_chain = prompt | structured_llm

Now invoke it with real data. Keep the prompt grounded in structured inputs only.

import json

req = KYCRequest(
    full_name="Jane Doe",
    date_of_birth="1991-03-11",
    country="GB",
    email="jane@example.com",
    government_id_type="passport",
    government_id_number="X1234567",
)

issues = deterministic_checks(req)

result = kyc_chain.invoke({
    "customer_json": json.dumps(req.model_dump(), indent=2),
    "issues": "\n".join(f"- {i}" for i in issues) or "- none"
})

print(result.status)
print(result.risk_score)
print(result.reasons)

3) Add tools for sanctions lookup or internal customer history

For payments teams, the agent should not hallucinate about sanctions or prior fraud flags. Wrap real systems as tools and let the agent call them explicitly.

from langchain_core.tools import tool

@tool
def lookup_internal_risk(email: str) -> str:
    """Look up internal fraud/KYC history by email."""
    # Replace with real DB query.
    if email.endswith("@highrisk.com"):
        return "prior_manual_review=true; prior_chargeback=true"
    return "prior_manual_review=false; prior_chargeback=false"

If you want an agentic flow rather than a single chain call, use create_tool_calling_agent with an AgentExecutor. That gives you controlled tool use while keeping the final decision structured.

4) Enforce audit logging and human escalation

Every decision needs an audit trail. In payments, you need to explain why a user was approved or blocked months later during compliance review.

from datetime import datetime

def persist_audit(record: dict) -> None:
    # Replace with append-only storage such as Postgres + WORM bucket.
    print(json.dumps(record))

audit_record = {
    "timestamp": datetime.utcnow().isoformat(),
    "input": req.model_dump(),
    "deterministic_issues": issues,
    "decision": result.model_dump(),
}

persist_audit(audit_record)

if result.status == "review":
    print("Send to human review queue")

Production Considerations

  • Keep PII out of logs

    • Mask government ID numbers and emails in application logs.
    • Store raw documents in encrypted object storage with strict access control.
  • Pin data residency

    • Route EU customer data to EU-hosted infrastructure.
    • Make sure your LLM provider supports region controls or use a private deployment for regulated markets.
  • Set hard guardrails

    • Never let the model override sanctions hits or restricted-country rules.
    • Use deterministic rejects for non-negotiable compliance conditions.
  • Monitor decision drift

    • Track approval rate, manual review rate, false positives on sanctions/PEP matches, and vendor latency.
    • Re-run evaluation sets whenever prompts or models change.

Common Pitfalls

  1. Letting the LLM make final compliance decisions without rules

    • Fix it by running deterministic checks first.
    • The model should explain borderline cases, not bypass policy.
  2. Using free-form text instead of structured outputs

    • Fix it with Pydantic models and with_structured_output().
    • You want JSON-like output that downstream systems can trust.
  3. Ignoring auditability

    • Fix it by persisting input snapshots, tool outputs, model version, prompt version, and final status.
    • In payments, “the model said so” is not an acceptable record.
  4. Mixing environments across jurisdictions

    • Fix it by separating data paths by region and vendor contract.
    • A UK merchant onboarding flow should not silently send personal data to an unapproved region.

If you build this pattern correctly, your KYC agent stays useful without becoming a compliance liability. The winning setup is simple: deterministic policy first, LangChain for structured reasoning second, human review for anything that touches ambiguity.


Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides