🔬 Healthcare Fraud Detection: Autoresearch Report

Applying Karpathy's autonomous research loop to Medicare claims data
Date: March 9, 2026  |  Data: 90M+ CMS Medicare records  |  Final AUC: 0.81
📊 Executive Summary In one evening of AI-assisted iteration, we built a fraud detection model that identifies Medicare providers likely to be excluded by the OIG with 0.81 AUC-ROC — starting from a random baseline of 0.55. The key discovery: fraud shows up as being extreme on any single dimension, not just above average across all dimensions.
0.5561 → 0.8098AUC improvement
18iterations tested
90M+CMS records
181ground truth matches

1. The Approach: Karpathy's Autoresearch Pattern

Inspired by Andrej Karpathy's autoresearch project, we adapted the same pattern to healthcare fraud detection:

The Loop
  1. eval.pyFixed oracle. Downloads OIG LEIE exclusion list, measures AUC-ROC. Never edited.
  2. detector.pyAgent edits this. Scoring logic that outputs fraud probability per provider.
  3. results.tsvExperiment log. Tracks AUC progression across iterations.

Unlike Karpathy's ML experiments where the eval was validation loss, our ground truth is the OIG LEIE (List of Excluded Individuals and Entities) — the official federal database of providers excluded from Medicare due to fraud convictions, license revocations, or program violations.

2. Data Sources & Scale

SourceRowsPurpose
CMS Part B Physician Claims1.26MBilling volume, service counts, payments
CMS Part D Prescribing1.38MDrug costs, opioid prescribing patterns
Open Payments (Sunshine Act)14.7MIndustry financial relationships
NPPES NPI Registry7.1MProvider demographics, taxonomy codes
PECOS Enrollment2.54MMedicare enrollment status
OIG LEIE Exclusion List~70KGround truth fraud labels

All data stored in a single 6GB DuckDB database with 30 tables, enabling fast SQL queries during iteration.

⚠️ Label Limitation The LEIE contains ~70K excluded providers, but most are non-physicians (nurses, aides, personal care workers) who don't bill Part B directly. Only 181 matched our CMS physician universe — a tiny 0.015% prevalence. This makes AUC a useful but imperfect metric.

3. Model Iterations & AUC Progression

Baseline
0.5561
V2
0.7695
V7
0.7904
V11
0.8013
V12 ✓
0.8098

Key Iterations Explained

VersionAUCKey ChangeResult
Baseline0.5561Raw billing z-scores, global normalizationNear random
V20.7695Z-scores within specialty + LA opioid rate+21 pts
V30.7433Added HCPCS concentration, taxonomy mismatch-3 pts (noise)
V70.7904Ensemble: 50% max + 50% weighted mean+2 pts
V120.8098Pure max(subscores) — no averagingBest

4. Key Findings: What Works

🎯 Discovery #1: Fraud = Extreme on ANY Dimension

Taking max(subscores) across all features dramatically outperforms weighted averaging. Fraudulent providers don't need to be suspicious on every metric — being in the 99th percentile on one metric is enough signal.

📈 Discovery #2: Services-per-Beneficiary Within Specialty

The single strongest fraud signal. Excluded providers bill 5-50x more services per patient than peers in the same specialty. Global z-scores miss this because oncologists legitimately bill more than dermatologists.

💊 Discovery #3: Long-Acting Opioid Rate

LA opioid prescribing rate (normalized within specialty) catches pill mills that simple opioid rates miss. Anesthesiologists legitimately prescribe opioids — but LA opioids for chronic pain outside of pain management is a red flag.

What Doesn't Work


5. Real-World Validation: Web Research on Top Suspects

We manually investigated top-scoring providers via web search to validate model outputs:

ProviderScoreFindingStatus
Robert Morton, MD
Psychiatrist, Ada OK
NPI 1336222504
0.88 $32.7M drug cost flagged. Actually prescribes ultra-expensive specialty psych drugs (Ingrezza $7,700/claim for tardive dyskinesia). 53 years experience at Rolling Hills Hospital. Legitimate
Anne Greist, MD
Hematologist, Indianapolis
NPI 1063477073
Top 48,875 services/bene flagged. IU Simon Cancer Center. Treats hemophilia patients with clotting factor infusions — legitimate academic center. Legitimate
Kashif Ali, MD
Oncologist, Greenbelt MD
NPI 1851423388
0.85 629,920 services on 1,308 patients (481/patient). UM Capital Region Health. Even for infusion oncology, this is an extreme outlier. Investigate
Harsha Vyas, MD
Oncologist, Dublin GA
NPI 1447447933
0.90 322,669 services on 764 patients (422/patient). CCMG Cancer Center. Affiliated with Fairview Park Hospital. Investigate
27777 Inkster Rd Cluster
Farmington Hills, MI
15,110 providers at one address. Actually: Centria Healthcare ABA therapy corporate headquarters. Legitimate large employer. Legitimate
💡 Key Insight: High-Volume ≠ Fraud Many top-scoring providers are legitimate high-volume practices (oncology infusions, academic medical centers, specialty psychiatry). The model correctly identifies statistical outliers — human review is essential to distinguish fraud from legitimate specialty care.

6. Technical Architecture

┌─────────────────────────────────────────────────────────────┐
│  eval.py (FIXED ORACLE)                                     │
│  ├── Download OIG LEIE CSV (~70K exclusions)                │
│  ├── Load CMS NPI universe from DuckDB (1.2M providers)     │
│  ├── Run detector.py via subprocess (stdin/stdout CSV)     │
│  ├── Compute AUC-ROC, Average Precision                    │
│  └── Append to results.tsv                                  │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  detector.py (AGENT EDITS THIS)                             │
│  ├── Read NPI list from stdin                               │
│  ├── Query DuckDB for features:                             │
│  │   ├── Part B billing (services, payments, beneficiaries) │
│  │   ├── Part D prescribing (opioid rates, drug costs)     │
│  │   ├── Open Payments (industry relationships)             │
│  │   └── PECOS enrollment gaps                              │
│  ├── Compute subscores per feature                          │
│  ├── Combine via max(subscores) — not weighted average      │
│  └── Output NPI,score CSV to stdout                         │
└─────────────────────────────────────────────────────────────┘
        

7. Next Steps

  1. Expand ground truth — Supplement LEIE with DOJ press releases, state medical board actions, news articles about fraud convictions
  2. Network analysis — Identify referring provider rings (A refers to B, B orders from C, C kicks back to A)
  3. Geographic clustering — Flag providers at suspicious addresses (not hospitals) with high aggregate billing
  4. Train proper ML model — Use hand-crafted features as inputs to gradient boosting with cross-validation
  5. Real-time validation — Auto-search top suspects, check for OIG press releases, medical board actions
  6. Frontend explorer — Interactive app for manual validation of flagged providers

8. Repository & Resources


Built: March 8-9, 2026 in one evening session
Author: Blake Thomson + Chief (AI assistant)
Data: All CMS data is publicly available at data.cms.gov