How to Build a KYC verification Agent Using LangChain in Python for payments
A KYC verification agent for payments takes customer identity data, checks it against policy and external sources, and returns a decision: approve, reject, or escalate to a human. For payment flows, this matters because bad KYC leads to fraud exposure, regulatory issues, failed onboarding, and expensive manual reviews.
Architecture
- •
Input normalization layer
- •Takes raw customer data from signup forms, CRM records, or uploaded documents.
- •Converts it into a consistent schema before any LLM call.
- •
Policy engine
- •Encodes your KYC rules: required fields, country restrictions, PEP/sanctions escalation, document freshness.
- •Keeps deterministic checks outside the model.
- •
LangChain reasoning layer
- •Uses an LLM through
ChatOpenAIto classify risk signals from structured inputs and extracted text. - •Produces a decision with explicit rationale.
- •Uses an LLM through
- •
Tooling layer
- •Wraps external systems like sanctions screening APIs, ID verification vendors, and internal customer databases.
- •Exposed to the agent through LangChain tools.
- •
Audit log store
- •Persists every input, tool result, model output, and final decision.
- •Required for compliance review and dispute handling.
- •
Human review queue
- •Captures borderline or high-risk cases.
- •Prevents the agent from making final calls on ambiguous payment customers.
Implementation
1) Define the KYC schema and deterministic policy checks
Start with a typed payload. Do not let the model infer structure from free text when you are dealing with payments onboarding.
from typing import Literal, Optional
from pydantic import BaseModel, Field
class KYCRequest(BaseModel):
full_name: str
date_of_birth: str
country: str
email: str
government_id_type: Literal["passport", "national_id", "driver_license"]
government_id_number: str
pep_match: bool = False
sanctions_match: bool = False
document_expiry_date: Optional[str] = None
class KYCDecision(BaseModel):
status: Literal["approve", "reject", "review"]
risk_score: int = Field(ge=0, le=100)
reasons: list[str]
Add basic policy gates before you call the model. This keeps obvious failures out of the LLM path.
def deterministic_checks(req: KYCRequest) -> list[str]:
issues = []
if req.sanctions_match:
issues.append("Sanctions match present")
if req.pep_match:
issues.append("PEP match present")
if req.country.upper() in {"IR", "KP", "SY"}:
issues.append("Restricted jurisdiction")
return issues
2) Build the LangChain chain with structured output
Use ChatOpenAI plus with_structured_output() so the model returns machine-readable decisions. This is the right pattern for compliance workflows because you want predictable output shapes.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_messages([
("system",
"You are a KYC analyst for a payments company. "
"Return conservative decisions. If there is any compliance ambiguity, choose review."),
("human",
"Customer record:\n{customer_json}\n\n"
"Deterministic issues:\n{issues}\n\n"
"Assess KYC risk for payment onboarding.")
])
structured_llm = llm.with_structured_output(KYCDecision)
kyc_chain = prompt | structured_llm
Now invoke it with real data. Keep the prompt grounded in structured inputs only.
import json
req = KYCRequest(
full_name="Jane Doe",
date_of_birth="1991-03-11",
country="GB",
email="jane@example.com",
government_id_type="passport",
government_id_number="X1234567",
)
issues = deterministic_checks(req)
result = kyc_chain.invoke({
"customer_json": json.dumps(req.model_dump(), indent=2),
"issues": "\n".join(f"- {i}" for i in issues) or "- none"
})
print(result.status)
print(result.risk_score)
print(result.reasons)
3) Add tools for sanctions lookup or internal customer history
For payments teams, the agent should not hallucinate about sanctions or prior fraud flags. Wrap real systems as tools and let the agent call them explicitly.
from langchain_core.tools import tool
@tool
def lookup_internal_risk(email: str) -> str:
"""Look up internal fraud/KYC history by email."""
# Replace with real DB query.
if email.endswith("@highrisk.com"):
return "prior_manual_review=true; prior_chargeback=true"
return "prior_manual_review=false; prior_chargeback=false"
If you want an agentic flow rather than a single chain call, use create_tool_calling_agent with an AgentExecutor. That gives you controlled tool use while keeping the final decision structured.
4) Enforce audit logging and human escalation
Every decision needs an audit trail. In payments, you need to explain why a user was approved or blocked months later during compliance review.
from datetime import datetime
def persist_audit(record: dict) -> None:
# Replace with append-only storage such as Postgres + WORM bucket.
print(json.dumps(record))
audit_record = {
"timestamp": datetime.utcnow().isoformat(),
"input": req.model_dump(),
"deterministic_issues": issues,
"decision": result.model_dump(),
}
persist_audit(audit_record)
if result.status == "review":
print("Send to human review queue")
Production Considerations
- •
Keep PII out of logs
- •Mask government ID numbers and emails in application logs.
- •Store raw documents in encrypted object storage with strict access control.
- •
Pin data residency
- •Route EU customer data to EU-hosted infrastructure.
- •Make sure your LLM provider supports region controls or use a private deployment for regulated markets.
- •
Set hard guardrails
- •Never let the model override sanctions hits or restricted-country rules.
- •Use deterministic rejects for non-negotiable compliance conditions.
- •
Monitor decision drift
- •Track approval rate, manual review rate, false positives on sanctions/PEP matches, and vendor latency.
- •Re-run evaluation sets whenever prompts or models change.
Common Pitfalls
- •
Letting the LLM make final compliance decisions without rules
- •Fix it by running deterministic checks first.
- •The model should explain borderline cases, not bypass policy.
- •
Using free-form text instead of structured outputs
- •Fix it with Pydantic models and
with_structured_output(). - •You want JSON-like output that downstream systems can trust.
- •Fix it with Pydantic models and
- •
Ignoring auditability
- •Fix it by persisting input snapshots, tool outputs, model version, prompt version, and final status.
- •In payments, “the model said so” is not an acceptable record.
- •
Mixing environments across jurisdictions
- •Fix it by separating data paths by region and vendor contract.
- •A UK merchant onboarding flow should not silently send personal data to an unapproved region.
If you build this pattern correctly, your KYC agent stays useful without becoming a compliance liability. The winning setup is simple: deterministic policy first, LangChain for structured reasoning second, human review for anything that touches ambiguity.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit