How to Build a KYC verification Agent Using LangChain in Python for lending
A KYC verification agent for lending collects applicant data, checks it against policy, validates identity documents, and produces a decision-ready summary for underwriters or downstream workflows. In lending, this matters because bad KYC creates regulatory exposure, slows loan origination, and increases fraud risk before a single dollar is disbursed.
Architecture
- •
Input intake layer
- •Accepts borrower application data, uploaded documents, and metadata like jurisdiction and product type.
- •Normalizes fields such as name, DOB, address, ID number, and beneficial ownership.
- •
Document extraction layer
- •Uses OCR or document parsers to extract text from passports, driver’s licenses, utility bills, bank statements, or incorporation docs.
- •Produces structured text that the agent can reason over.
- •
Policy reasoning agent
- •Uses LangChain
ChatPromptTemplate,RunnableLambda, and an LLM-backed chain to compare extracted facts against KYC rules. - •Applies lending-specific checks like address consistency, age eligibility, sanctions screening flags, and document freshness.
- •Uses LangChain
- •
Tooling layer
- •Wraps external systems: sanctions/PEP screening APIs, OCR service, CRM/customer profile store, and audit log writer.
- •Keeps the model from hallucinating verification results.
- •
Decision and audit layer
- •Outputs
pass,review, orfailwith reasons. - •Persists every input, tool call, model output, and final decision for compliance review.
- •Outputs
Implementation
1) Define the data model and KYC checks
Start with explicit schemas. In lending, you want the agent to reason over structured inputs instead of free-form chat.
from typing import List, Literal
from pydantic import BaseModel, Field
class Applicant(BaseModel):
full_name: str
date_of_birth: str
address: str
country: str
id_number: str
class KycDocument(BaseModel):
doc_type: Literal["passport", "drivers_license", "utility_bill", "bank_statement"]
extracted_text: str
issue_country: str | None = None
class KycResult(BaseModel):
decision: Literal["pass", "review", "fail"]
reasons: List[str] = Field(default_factory=list)
missing_fields: List[str] = Field(default_factory=list)
2) Build the LangChain prompt and structured output chain
Use ChatPromptTemplate plus with_structured_output() so the model returns machine-readable results. This is the pattern you want in production workflows.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_messages([
("system",
"""You are a KYC verification assistant for a lending platform.
Return only a structured result.
Rules:
- PASS only if identity fields are consistent and no obvious red flags exist.
- REVIEW if information is incomplete or ambiguous.
- FAIL if there are clear mismatches or suspicious signals.
Consider lending compliance requirements and produce concise reasons."""),
("human",
"Applicant:\n{applicant}\n\nDocuments:\n{documents}\n\nJurisdiction: {jurisdiction}")
])
kyc_chain = prompt | llm.with_structured_output(KycResult)
3) Add real tools for screening and evidence gathering
The agent should not “decide” sanctions or PEP status from memory. Wrap those checks as tools so the workflow can call real systems.
from langchain_core.tools import tool
@tool
def sanctions_screen(name: str) -> str:
# Replace with your vendor API call
if name.lower() in {"john doe", "test user"}:
return "match_found"
return "no_match"
@tool
def check_address_country(address: str) -> str:
# Replace with geocoding / residency validation logic
return "valid" if address else "invalid"
If you need multi-step orchestration, use RunnableLambda to precompute signals before invoking the LLM.
from langchain_core.runnables import RunnableLambda
def enrich_input(payload: dict) -> dict:
applicant = payload["applicant"]
payload["sanctions_result"] = sanctions_screen.invoke(applicant.full_name)
payload["address_check"] = check_address_country.invoke(applicant.address)
return payload
enrichment = RunnableLambda(enrich_input)
4) Run the end-to-end verification flow
This combines enrichment with the structured LLM decision. The output is suitable for downstream underwriting or manual review queues.
from pprint import pprint
def verify_kyc(applicant: Applicant, documents: list[KycDocument], jurisdiction: str) -> KycResult:
payload = {
"applicant": applicant.model_dump(),
"documents": [doc.model_dump() for doc in documents],
"jurisdiction": jurisdiction,
"sanctions_result": sanctions_screen.invoke(applicant.full_name),
"address_check": check_address_country.invoke(applicant.address),
}
result = kyc_chain.invoke(payload)
# Add deterministic overrides for lending compliance
if payload["sanctions_result"] == "match_found":
result.decision = "review"
result.reasons.append("Sanctions screening returned a potential match")
if not applicant.id_number:
result.decision = "review"
result.missing_fields.append("id_number")
return result
applicant = Applicant(
full_name="Jane Smith",
date_of_birth="1990-05-12",
address="12 King Street, London",
country="GB",
id_number="GB1234567"
)
documents = [
KycDocument(
doc_type="passport",
extracted_text="Passport Jane Smith DOB 1990-05-12 Issued GB",
issue_country="GB"
),
]
result = verify_kyc(applicant, documents, jurisdiction="UK")
pprint(result.model_dump())
Production Considerations
- •
Keep PII in-region
- •For lending workloads, route customer data to a model endpoint that satisfies data residency requirements.
- •If your bank operates in multiple jurisdictions, segment processing by region and avoid cross-border document movement unless policy allows it.
- •
Log every decision path
- •Persist the prompt version, input hashes, tool outputs, model response, and final override logic.
- •Auditors care about why a borrower was sent to manual review just as much as the final outcome.
- •
Add deterministic guardrails
- •Never let the LLM make final sanctions decisions without external screening tools.
- •Use hard rules for age thresholds, missing mandatory fields, expired IDs, and jurisdiction-specific compliance checks.
- •
Monitor drift by segment
- •Track false positives by country, document type, channel source, and loan product.
- •A spike in manual reviews for one region usually means either extraction quality dropped or your prompt/rules are too strict.
Common Pitfalls
- •
Letting the model infer facts that should come from tools
- •Bad pattern: asking an LLM whether someone is on a sanctions list based on its internal knowledge.
- •Fix it by calling actual screening APIs through LangChain tools and using the model only to summarize evidence.
- •
Using unstructured free-text outputs
- •Bad pattern: parsing “looks good” responses from chat completions.
- •Fix it with
with_structured_output()and Pydantic models so downstream systems can reliably consume decisions.
- •
Skipping compliance overrides
- •Bad pattern: accepting the model’s “pass” even when required fields are missing or a watchlist hit exists.
- •Fix it with deterministic post-processing rules that can force
revieworfailregardless of model confidence.
- •
Ignoring auditability
- •Bad pattern: storing only the final decision.
- •Fix it by recording inputs, extracted text snippets, tool outputs, prompt versions, and timestamps so investigators can reconstruct every step.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit