How to Build a transaction monitoring Agent Using CrewAI in Python for healthcare
A transaction monitoring agent in healthcare watches claims, payments, refunds, eligibility changes, and provider billing activity for patterns that look abnormal, non-compliant, or fraudulent. It matters because bad transactions are not just financial noise; they can trigger audit findings, violate HIPAA-related controls, expose protected health information, and create downstream denial or recoupment risk.
Architecture
- •
Ingestion layer
- •Pulls transaction events from claims systems, payment gateways, EHR billing exports, or Kafka topics.
- •Normalizes records into a shared schema with fields like
member_id,provider_id,amount,service_code,timestamp, andregion.
- •
Rules and anomaly context
- •Applies deterministic checks first: duplicate claim submission, unusual frequency, out-of-network billing, high-dollar spikes.
- •Keeps the agent grounded before it reasons over edge cases.
- •
CrewAI agent layer
- •Uses a small set of specialized agents:
- •triage agent
- •compliance agent
- •fraud/risk analyst agent
- •Each agent has a narrow role and explicit output format.
- •Uses a small set of specialized agents:
- •
Evidence store
- •Persists raw events, intermediate reasoning artifacts, and final decisions.
- •Needed for auditability and later review by compliance teams.
- •
Case management output
- •Creates alerts with severity, explanation, evidence references, and recommended action.
- •Routes to SIU, compliance ops, or billing operations.
Implementation
1) Install dependencies and define the transaction schema
Use CrewAI with a strict Python data model so every transaction is validated before an LLM sees it. In healthcare workflows, schema discipline matters because you do not want free-form notes drifting into PHI-heavy prompts.
from pydantic import BaseModel, Field
from typing import Literal
from datetime import datetime
class HealthcareTransaction(BaseModel):
transaction_id: str
member_id: str
provider_id: str
amount: float = Field(gt=0)
service_code: str
timestamp: datetime
region: str
channel: Literal["claim", "refund", "eligibility", "payment"]
status: Literal["approved", "pending", "reversed", "denied"]
2) Create specialized CrewAI agents
CrewAI’s Agent class is the right fit here because each role should have one job. Keep the prompts short and operational.
from crewai import Agent
triage_agent = Agent(
role="Transaction Triage Analyst",
goal="Classify healthcare transactions into normal, suspicious, or urgent review",
backstory=(
"You review healthcare payment and claims activity for anomalies. "
"You prioritize deterministic signals first and escalate only when evidence supports it."
),
verbose=True,
)
compliance_agent = Agent(
role="Healthcare Compliance Reviewer",
goal="Check whether the transaction may violate healthcare billing or privacy controls",
backstory=(
"You understand HIPAA-adjacent operational controls, audit requirements, "
"and data residency constraints for healthcare organizations."
),
verbose=True,
)
3) Build tasks that force structured outputs
Use Task objects with explicit descriptions. For production systems, ask for JSON-like output so downstream services can parse it reliably.
from crewai import Task
triage_task = Task(
description=(
"Review this healthcare transaction and return a risk classification "
"with concise reasons and evidence references.\n\n"
f"Transaction:\n{transaction.model_dump_json(indent=2)}"
),
expected_output=(
"A structured assessment with fields: classification, risk_score, reasons, "
"and evidence_references."
),
agent=triage_agent,
)
compliance_task = Task(
description=(
"Review the same transaction for compliance concerns such as unusual billing patterns, "
"possible duplicate submission, or privacy-sensitive handling issues."
),
expected_output="A compliance note with findings and recommended action.",
agent=compliance_agent,
)
4) Run the crew and persist the decision
Crew is the orchestration layer. In a real service you would wrap this in an API endpoint or queue worker and write both inputs and outputs to your audit store.
from crewai import Crew, Process
crew = Crew(
agents=[triage_agent, compliance_agent],
tasks=[triage_task, compliance_task],
process=Process.sequential,
verbose=True,
)
result = crew.kickoff()
print(result)
For a production pattern, keep the LLM out of raw PHI where possible. Pre-redact member names, addresses, full account numbers, and clinical notes before building the task payload.
Production Considerations
- •
Deployment
- •Run the agent in a private VPC or private cluster with outbound network controls.
- •Keep model endpoints in approved regions if your healthcare org has data residency requirements.
- •Separate ingestion workers from LLM workers so you can scale independently.
- •
Monitoring
- •Log every input transaction ID, model response ID if available, classification outcome, latency, and human override.
- •Track false positives by provider group and service code; healthcare fraud patterns vary by specialty.
- •Add alerting for prompt failures or empty outputs so transactions never disappear silently.
- •
Guardrails
- •Redact PHI before prompting unless there is a documented business need.
- •Enforce allowlisted output schemas; reject free-text decisions that cannot be parsed.
- •Add human review thresholds for high-value claims or cases involving sensitive categories like behavioral health.
- •
Auditability
- •Store immutable evidence bundles: source event hash, rule hits, agent outputs, reviewer actions.
- •Make sure investigators can reconstruct why an alert was raised six months later during an audit.
Common Pitfalls
- •
Sending full PHI into the prompt
- •Avoid this by redacting identifiers and using surrogate keys. The agent needs enough context to reason; it does not need patient names or clinical narratives.
- •
Using one generic agent for everything
- •Don’t collapse triage, compliance review, and fraud analysis into one prompt. Split responsibilities so each task stays narrow and easier to validate.
- •
Skipping deterministic rules
- •Pure LLM judgment is too loose for healthcare transaction monitoring. Run hard rules first for duplicates, threshold breaches, out-of-network behavior, then let CrewAI handle ambiguous cases.
- •
No audit trail
- •If you cannot explain why an alert fired to compliance or internal audit teams, the system is incomplete. Persist inputs, outputs,, timestamps,, and reviewer decisions from day one.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit