How to Build a transaction monitoring Agent Using LangGraph in Python for healthcare
A transaction monitoring agent for healthcare watches claims, payments, refunds, and provider billing activity, then flags patterns that look inconsistent with policy, fraud, waste, abuse, or simple operational errors. It matters because healthcare money movement is high-volume, regulated, and expensive to investigate manually; if you miss anomalies, you pay in leakage and compliance risk.
Architecture
- •
Ingestion layer
- •Pulls transactions from claims systems, payment rails, EHR-adjacent billing feeds, and remittance files.
- •Normalizes records into a single schema: patient/account IDs, provider IDs, CPT/ICD codes, amount, timestamp, location.
- •
Rules + signal extraction node
- •Applies deterministic checks first: duplicate claim patterns, impossible service-location combinations, out-of-hours billing, repeated reversals.
- •Computes derived features like velocity counts and payer/provider historical baselines.
- •
LLM triage node
- •Summarizes why a transaction is suspicious in plain English.
- •Produces a structured risk assessment that an investigator can review.
- •
Human review gate
- •Routes high-risk cases to compliance or SIU analysts.
- •Keeps the final decision outside the model for auditability.
- •
Audit log store
- •Persists every state transition, model output, and reviewer action.
- •Needed for HIPAA-adjacent controls, internal audits, and post-incident reconstruction.
- •
Policy guardrails
- •Enforces PHI minimization, data residency constraints, and allowed action types.
- •Blocks unsafe outputs like diagnosis inference or unsupported denial recommendations.
Implementation
1) Define the state and graph nodes
Use StateGraph from LangGraph to model the workflow as explicit steps. Keep the state small and typed so you can trace what changed at each node.
from typing import TypedDict, List, Literal
from langgraph.graph import StateGraph, END
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
class TxState(TypedDict):
transaction_id: str
amount: float
provider_id: str
patient_id: str
service_code: str
location: str
risk_score: int
findings: List[str]
decision: Literal["review", "clear"]
summary: str
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
def extract_signals(state: TxState) -> TxState:
findings = []
if state["amount"] > 5000:
findings.append("High-value transaction")
if state["location"] not in ["NY", "NJ", "CT"]:
findings.append("Out-of-network geography")
if state["service_code"].startswith("992") and state["amount"] > 1000:
findings.append("Potential coding/charge mismatch")
risk = min(100, len(findings) * 30)
return {**state, "findings": findings, "risk_score": risk}
def llm_triage(state: TxState) -> TxState:
prompt = f"""
You are reviewing a healthcare transaction for compliance triage.
Transaction ID: {state['transaction_id']}
Amount: {state['amount']}
Provider: {state['provider_id']}
Service code: {state['service_code']}
Findings: {state['findings']}
Return a short audit-friendly summary only.
"""
resp = llm.invoke([HumanMessage(content=prompt)])
decision = "review" if state["risk_score"] >= 50 else "clear"
return {**state, "summary": resp.content.strip(), "decision": decision}
2) Add routing logic with add_conditional_edges
This is where LangGraph becomes useful. The graph decides whether to stop after automated triage or send the case to a human review path.
def route_case(state: TxState):
return "human_review" if state["decision"] == "review" else END
def human_review(state: TxState) -> TxState:
# In production this would write to a case management system.
return {**state}
graph = StateGraph(TxState)
graph.add_node("extract_signals", extract_signals)
graph.add_node("llm_triage", llm_triage)
graph.add_node("human_review", human_review)
graph.set_entry_point("extract_signals")
graph.add_edge("extract_signals", "llm_triage")
graph.add_conditional_edges("llm_triage", route_case)
app = graph.compile()
3) Run the agent on a transaction payload
The compiled graph exposes .invoke(). That gives you a deterministic execution path and a single state object at the end.
sample_tx = {
"transaction_id": "TX-10001",
"amount": 7200.0,
"provider_id": "PRV-44",
"patient_id": "PT-9001",
"service_code": "99214",
"location": "CA",
"risk_score": 0,
"findings": [],
"decision": "clear",
"summary": ""
}
result = app.invoke(sample_tx)
print(result["decision"])
print(result["risk_score"])
print(result["summary"])
4) Add persistence-ready checkpoints for reviewability
For healthcare workflows you want replayable runs. LangGraph supports checkpointing through checkpointers; use one in production so investigators can inspect how the agent reached its decision.
from langgraph.checkpoint.memory import MemorySaver
checkpointer = MemorySaver()
app_with_memory = graph.compile(checkpointer=checkpointer)
thread_config = {"configurable": {"thread_id": "case-TX-10001"}}
result = app_with_memory.invoke(sample_tx, config=thread_config)
Production Considerations
- •
Protect PHI by design
- •Send only the minimum necessary fields into the graph.
- •Redact names, full addresses, notes text, and free-form clinical content unless there is a documented need.
- •
Keep data residency explicit
- •If your healthcare org requires US-only processing or region-specific storage, pin model endpoints and checkpoint stores accordingly.
- •Don’t let case data drift into unmanaged logs or third-party observability tools.
- •
Make every decision auditable
- •Persist input payload hashes, node outputs, timestamps, reviewer actions, and model version IDs.
- •In investigations you need to prove what the system saw and why it routed a case.
- •
Add guardrails before automation
- •The agent should recommend review or clear; it should not auto-deny claims unless policy explicitly allows it.
- •Block any output that infers diagnosis or treatment necessity from payment behavior alone.
Common Pitfalls
- •
Putting raw PHI into prompts
- •Don’t pass chart notes or full identifiers just because the LLM can handle them.
- •Extract only transaction-level features unless an approved workflow requires more context.
- •
Skipping deterministic checks
- •If you rely only on the LLM for anomaly detection, your results will be inconsistent and hard to defend.
- •Use rules first for obvious patterns like duplicates and impossible combinations.
- •
No human escalation path
- •A healthcare monitoring agent must not be a black box that silently acts on sensitive transactions.
- •Always route borderline or high-risk cases into a review queue with clear evidence attached.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit