Healthcare Fraud Detection Report

📊 Executive Summary In one evening of AI-assisted iteration, we built a fraud detection model that identifies Medicare providers likely to be excluded by the OIG with 0.81 AUC-ROC — starting from a random baseline of 0.55. The key discovery: fraud shows up as being extreme on any single dimension, not just above average across all dimensions.

0.5561 → 0.8098AUC improvement

18iterations tested

90M+CMS records

181ground truth matches

1. The Approach: Karpathy's Autoresearch Pattern

Inspired by Andrej Karpathy's autoresearch project, we adapted the same pattern to healthcare fraud detection:

The Loop

eval.py — Fixed oracle. Downloads OIG LEIE exclusion list, measures AUC-ROC. Never edited.
detector.py — Agent edits this. Scoring logic that outputs fraud probability per provider.
results.tsv — Experiment log. Tracks AUC progression across iterations.

Unlike Karpathy's ML experiments where the eval was validation loss, our ground truth is the OIG LEIE (List of Excluded Individuals and Entities) — the official federal database of providers excluded from Medicare due to fraud convictions, license revocations, or program violations.

2. Data Sources & Scale

Source	Rows	Purpose
CMS Part B Physician Claims	1.26M	Billing volume, service counts, payments
CMS Part D Prescribing	1.38M	Drug costs, opioid prescribing patterns
Open Payments (Sunshine Act)	14.7M	Industry financial relationships
NPPES NPI Registry	7.1M	Provider demographics, taxonomy codes
PECOS Enrollment	2.54M	Medicare enrollment status
OIG LEIE Exclusion List	~70K	Ground truth fraud labels

All data stored in a single 6GB DuckDB database with 30 tables, enabling fast SQL queries during iteration.

⚠️ Label Limitation The LEIE contains ~70K excluded providers, but most are non-physicians (nurses, aides, personal care workers) who don't bill Part B directly. Only 181 matched our CMS physician universe — a tiny 0.015% prevalence. This makes AUC a useful but imperfect metric.

3. Model Iterations & AUC Progression

Baseline

0.5561

0.7695

0.7904

V11

0.8013

V12 ✓

0.8098

Key Iterations Explained

Version	AUC	Key Change	Result
Baseline	0.5561	Raw billing z-scores, global normalization	Near random
V2	0.7695	Z-scores within specialty + LA opioid rate	+21 pts
V3	0.7433	Added HCPCS concentration, taxonomy mismatch	-3 pts (noise)
V7	0.7904	Ensemble: 50% max + 50% weighted mean	+2 pts
V12	0.8098	Pure max(subscores) — no averaging	Best

4. Key Findings: What Works

🎯 Discovery #1: Fraud = Extreme on ANY Dimension

Taking max(subscores) across all features dramatically outperforms weighted averaging. Fraudulent providers don't need to be suspicious on every metric — being in the 99th percentile on one metric is enough signal.

📈 Discovery #2: Services-per-Beneficiary Within Specialty

The single strongest fraud signal. Excluded providers bill 5-50x more services per patient than peers in the same specialty. Global z-scores miss this because oncologists legitimately bill more than dermatologists.

💊 Discovery #3: Long-Acting Opioid Rate

LA opioid prescribing rate (normalized within specialty) catches pill mills that simple opioid rates miss. Anesthesiologists legitimately prescribe opioids — but LA opioids for chronic pain outside of pain management is a red flag.

What Doesn't Work

HCPCS concentration (HHI) — Adds noise. Some legitimate specialties bill few codes.
Taxonomy mismatch — Too many false positives. Internists doing cardiology is common.
Percentile bucketing — Z-scores capture nuance better than "top 10%" buckets.
Adding more features — Each weak feature dilutes the strong ones.

5. Real-World Validation: Web Research on Top Suspects

We manually investigated top-scoring providers via web search to validate model outputs:

Provider	Score	Finding	Status
Robert Morton, MD Psychiatrist, Ada OK NPI 1336222504	0.88	$32.7M drug cost flagged. Actually prescribes ultra-expensive specialty psych drugs (Ingrezza $7,700/claim for tardive dyskinesia). 53 years experience at Rolling Hills Hospital.	Legitimate
Anne Greist, MD Hematologist, Indianapolis NPI 1063477073	Top	48,875 services/bene flagged. IU Simon Cancer Center. Treats hemophilia patients with clotting factor infusions — legitimate academic center.	Legitimate
Kashif Ali, MD Oncologist, Greenbelt MD NPI 1851423388	0.85	629,920 services on 1,308 patients (481/patient). UM Capital Region Health. Even for infusion oncology, this is an extreme outlier.	Investigate
Harsha Vyas, MD Oncologist, Dublin GA NPI 1447447933	0.90	322,669 services on 764 patients (422/patient). CCMG Cancer Center. Affiliated with Fairview Park Hospital.	Investigate
27777 Inkster Rd Cluster Farmington Hills, MI	—	15,110 providers at one address. Actually: Centria Healthcare ABA therapy corporate headquarters. Legitimate large employer.	Legitimate

💡 Key Insight: High-Volume ≠ Fraud Many top-scoring providers are legitimate high-volume practices (oncology infusions, academic medical centers, specialty psychiatry). The model correctly identifies statistical outliers — human review is essential to distinguish fraud from legitimate specialty care.

6. Technical Architecture

┌─────────────────────────────────────────────────────────────┐
│  eval.py (FIXED ORACLE)                                     │
│  ├── Download OIG LEIE CSV (~70K exclusions)                │
│  ├── Load CMS NPI universe from DuckDB (1.2M providers)     │
│  ├── Run detector.py via subprocess (stdin/stdout CSV)     │
│  ├── Compute AUC-ROC, Average Precision                    │
│  └── Append to results.tsv                                  │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  detector.py (AGENT EDITS THIS)                             │
│  ├── Read NPI list from stdin                               │
│  ├── Query DuckDB for features:                             │
│  │   ├── Part B billing (services, payments, beneficiaries) │
│  │   ├── Part D prescribing (opioid rates, drug costs)     │
│  │   ├── Open Payments (industry relationships)             │
│  │   └── PECOS enrollment gaps                              │
│  ├── Compute subscores per feature                          │
│  ├── Combine via max(subscores) — not weighted average      │
│  └── Output NPI,score CSV to stdout                         │
└─────────────────────────────────────────────────────────────┘

7. Next Steps

Expand ground truth — Supplement LEIE with DOJ press releases, state medical board actions, news articles about fraud convictions
Network analysis — Identify referring provider rings (A refers to B, B orders from C, C kicks back to A)
Geographic clustering — Flag providers at suspicious addresses (not hospitals) with high aggregate billing
Train proper ML model — Use hand-crafted features as inputs to gradient boosting with cross-validation
Real-time validation — Auto-search top suspects, check for OIG press releases, medical board actions
Frontend explorer — Interactive app for manual validation of flagged providers

8. Repository & Resources

GitHub: blakethom8/cms-fraud-detection
CMS Data Pipeline: blakethom8/cms-data
Karpathy's Autoresearch: karpathy/autoresearch
OIG LEIE: oig.hhs.gov/exclusions
CMS Data: data.cms.gov

Built: March 8-9, 2026 in one evening session
Author: Blake Thomson + Chief (AI assistant)
Data: All CMS data is publicly available at data.cms.gov

🔬 Healthcare Fraud Detection: Autoresearch Report