How to Build a compliance checking Agent Using AutoGen in Python for fintech

By Cyprian AaronsUpdated 2026-04-21
compliance-checkingautogenpythonfintech

A compliance checking agent for fintech reviews customer-facing text, transaction notes, onboarding responses, or internal workflows against policy rules before anything ships or gets approved. It matters because one missed disclosure, one prohibited claim, or one data handling mistake can trigger regulatory exposure, customer harm, and audit findings.

Architecture

  • Policy source
    • A versioned set of rules: KYC/AML language checks, marketing disclaimers, PII handling, retention requirements, and jurisdiction-specific constraints.
  • Analyzer agent
    • An AssistantAgent that inspects the input and produces a structured compliance assessment.
  • Reviewer / verifier agent
    • A second AssistantAgent that challenges the first pass and looks for missed violations or weak reasoning.
  • Orchestrator
    • A GroupChat plus GroupChatManager to coordinate the conversation and enforce the review flow.
  • Audit logger
    • Persistent storage for prompts, outputs, policy version, timestamps, and final disposition.
  • Human escalation path
    • A manual review queue for ambiguous cases, high-risk jurisdictions, or blocked decisions.

Implementation

1. Install and configure AutoGen

Use AutoGen’s Python package and keep your model config explicit. In fintech, you want deterministic deployment settings, not hidden defaults.

pip install pyautogen
import os
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager

llm_config = {
    "config_list": [
        {
            "model": "gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
        }
    ],
    "temperature": 0,
}

2. Define the compliance roles

Use one agent to assess policy adherence and another to verify the assessment. Keep the system messages narrow and operational.

compliance_agent = AssistantAgent(
    name="compliance_agent",
    llm_config=llm_config,
    system_message=(
        "You are a fintech compliance checker. "
        "Review text for KYC/AML issues, misleading claims, prohibited promises, "
        "PII exposure, and missing required disclosures. "
        "Return concise findings with severity and rationale."
    ),
)

review_agent = AssistantAgent(
    name="review_agent",
    llm_config=llm_config,
    system_message=(
        "You are a strict compliance reviewer. "
        "Challenge the prior assessment. Look for missed risks, weak evidence, "
        "and jurisdictional issues. If no issue exists, say so explicitly."
    ),
)

user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
)

3. Run a group chat for two-pass review

This is the core pattern: first pass identifies issues, second pass validates them. The manager coordinates both agents until they converge.

policy_text = """
Policy:
- Do not promise guaranteed returns.
- Do not request full card numbers or passwords.
- Flag any PII in free-text fields.
- Marketing claims must include risk disclosure.
- Escalate if the text mentions restricted jurisdictions.
"""

input_text = """
We can guarantee 12% monthly returns with no risk.
Send us your full account number and ID so we can verify you quickly.
"""

message = f"""
Policy:
{policy_text}

Text to review:
{input_text}

Task:
1) Identify compliance issues.
2) Classify severity as low/medium/high.
3) Recommend approve/reject/escalate.
4) Cite which rule was triggered.
"""

group_chat = GroupChat(
    agents=[user_proxy, compliance_agent, review_agent],
    messages=[],
    max_round=4,
)

manager = GroupChatManager(groupchat=group_chat, llm_config=llm_config)

result = user_proxy.initiate_chat(manager, message=message)
print(result)

4. Add an auditable decision wrapper

In production you should persist the input, policy version, model output, and final decision. The agent is not the system of record; your audit store is.

from datetime import datetime
import json

def audit_record(input_text: str, result: str, policy_version: str) -> dict:
    return {
        "timestamp_utc": datetime.utcnow().isoformat(),
        "policy_version": policy_version,
        "input_text": input_text,
        "agent_result": str(result),
        "decision": "manual_review" if "escalate" in str(result).lower() else "auto_reviewed",
    }

record = audit_record(input_text=input_text, result=result.summary if hasattr(result, "summary") else result,
                      policy_version="2026-04-01")

with open("compliance_audit.jsonl", "a", encoding="utf-8") as f:
    f.write(json.dumps(record) + "\n")

Production Considerations

  • Deploy in-region
    • If you operate under data residency constraints, pin model endpoints and storage to the required geography. Do not ship customer PII across regions just because the LLM endpoint is convenient.
  • Log everything needed for audit
    • Store prompt inputs, policy version hash, model name, timestamps, decision outcome, and human override actions. Regulators care about traceability more than cleverness.
  • Add hard guardrails before the LLM
    • Redact PANs, account numbers, SSNs/NINs/NIFs before sending text to AutoGen. Use deterministic regex-based filters first; do not rely on the model to “notice” sensitive data reliably.
  • Force escalation on uncertainty
    • If confidence is low or the text touches high-risk areas like sanctions screening or cross-border transfers, route to human review instead of auto-approval.

Common Pitfalls

  • Treating the agent output as a final legal decision
    • Avoid this by making the agent only recommend approve, reject, or escalate. Final approval should sit behind a human or rules engine for regulated flows.
  • Sending raw sensitive data into prompts
    • Avoid this with preprocessing: mask PII/PCI fields before calling initiate_chat. If you need exact values for validation, use secure internal services outside the LLM path.
  • No policy versioning
    • Avoid this by attaching a versioned policy document to every run. When auditors ask why something passed last month but fails today, you need to show which rule set was active.
  • Single-pass review only
    • A single agent misses edge cases. Use a second reviewer agent or deterministic checks for prohibited phrases and jurisdiction flags before release.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides