How to Build a compliance checking Agent Using LangChain in Python for lending
A compliance checking agent for lending reviews loan applications, supporting documents, and decision outputs against policy rules before anything is approved or sent downstream. It matters because lending decisions are regulated, auditable, and expensive to get wrong: a missed adverse-action notice, an unsupported debt-to-income threshold, or a residency violation can create regulatory exposure fast.
Architecture
- •
Document ingestion layer
- •Pulls application data, income docs, bank statements, and policy PDFs.
- •Normalizes text into a consistent schema before any LLM call.
- •
Policy retrieval layer
- •Uses a vector store or keyword retrieval to fetch the relevant lending policy clauses.
- •Keeps the agent grounded in current underwriting and compliance rules.
- •
Compliance reasoning chain
- •Compares the application facts against policy requirements.
- •Produces structured findings like
pass,review, orreject.
- •
Audit trail store
- •Persists inputs, retrieved policy snippets, model output, and final decision.
- •Needed for regulator review and internal model governance.
- •
Guardrail layer
- •Blocks unsafe actions like making final credit decisions without human review.
- •Enforces PII handling, data residency constraints, and output schema validation.
Implementation
1) Load lending policy into a retriever
For production systems, keep policy text versioned and separate from application data. Here we use LangChain’s TextLoader, RecursiveCharacterTextSplitter, Chroma, and RetrievalQA pattern to ground the agent in policy.
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
loader = TextLoader("lending_policy.txt", encoding="utf-8")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=120)
chunks = splitter.split_documents(docs)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_lending_policy"
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
2) Define a structured compliance output
Do not let the model free-form its answer. Use a Pydantic schema with PydanticOutputParser so downstream systems can reliably route cases to approve, manual review, or reject.
from typing import List
from pydantic import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
class ComplianceFinding(BaseModel):
status: str = Field(description="pass, review, or reject")
reasons: List[str] = Field(description="Specific policy-based reasons")
required_actions: List[str] = Field(description="What must happen next")
risk_flags: List[str] = Field(description="Compliance or underwriting risks")
parser = PydanticOutputParser(pydantic_object=ComplianceFinding)
prompt = ChatPromptTemplate.from_messages([
("system",
"You are a lending compliance reviewer. "
"Use only the provided policy context and application facts. "
"If evidence is insufficient, return review."),
("human",
"Application facts:\n{application}\n\n"
"Policy context:\n{context}\n\n"
"{format_instructions}")
]).partial(format_instructions=parser.get_format_instructions())
3) Build the LangChain pipeline
This is the core agent flow: retrieve relevant policy snippets, feed them with the application facts into a chat model, then parse the result into a structured object.
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
compliance_chain = (
{
"context": retriever | format_docs,
"application": RunnablePassthrough()
}
| prompt
| llm
| parser
)
application_facts = """
Applicant: Jane Doe
Income: $92,000 annual salary
Monthly debt payments: $2,450
Requested loan amount: $18,000
State: CA
KYC status: verified
Employment tenure: 8 months
"""
result = compliance_chain.invoke(application_facts)
print(result.model_dump())
That pattern gives you deterministic structure without pretending the model is your decision engine. In lending, the LLM should explain and classify; your policy engine should still own final approval logic.
4) Add audit logging around every decision
Store what was reviewed and why. This is non-negotiable for lending because you need traceability across policy versioning, adverse-action support, and internal audit requests.
import json
from datetime import datetime
def run_and_audit(application_text: str):
retrieved_docs = retriever.invoke(application_text)
context_text = format_docs(retrieved_docs)
finding = compliance_chain.invoke(application_text)
audit_record = {
"timestamp": datetime.utcnow().isoformat(),
"application": application_text,
"retrieved_policy_ids": [d.metadata.get("source", "unknown") for d in retrieved_docs],
"policy_context": context_text,
"finding": finding.model_dump()
}
with open("lending_compliance_audit.jsonl", "a", encoding="utf-8") as f:
f.write(json.dumps(audit_record) + "\n")
return finding
print(run_and_audit(application_facts))
Production Considerations
- •
Keep final credit decisions out of the LLM
- •The agent should flag violations and recommend routing.
- •A deterministic rules service should enforce approval thresholds like DTI caps or minimum documentation.
- •
Version policies and prompts
- •Store policy document hashes, prompt versions, and model versions together.
- •When regulators ask why a file was rejected six months ago, you need exact reproducibility.
- •
Control data residency
- •Lending files often contain PII and financial data subject to regional storage requirements.
- •Pin vector stores, logs, and model endpoints to approved jurisdictions.
- •
Monitor false positives and missed violations
- •Track how often cases go to manual review versus how often reviewers overturn them.
- •High override rates usually mean your retrieval quality or prompt grounding is weak.
Common Pitfalls
- •
Using the LLM as the source of truth
- •Mistake: letting the model invent lending rules from memory.
- •Fix: retrieve policies from an approved document set and require citations or evidence snippets in every output.
- •
Skipping structured outputs
- •Mistake: parsing free-text answers with regex.
- •Fix: use
PydanticOutputParserso your workflow can reliably branch onstatus,reasons, andrequired_actions.
- •
Ignoring auditability
- •Mistake: only storing the final verdict.
- •Fix: log application facts, retrieved policy chunks, prompt version, model version, and timestamps for every run.
A good lending compliance agent is not a chatbot with opinions. It is a controlled decision support component that grounds itself in current policy, produces structured findings, preserves an audit trail, and stops short of making regulated decisions on its own.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit