Inspired by Andrej Karpathy's autoresearch project, we adapted the same pattern to healthcare fraud detection:
eval.py — Fixed oracle. Downloads OIG LEIE exclusion list, measures AUC-ROC. Never edited.detector.py — Agent edits this. Scoring logic that outputs fraud probability per provider.results.tsv — Experiment log. Tracks AUC progression across iterations.Unlike Karpathy's ML experiments where the eval was validation loss, our ground truth is the OIG LEIE (List of Excluded Individuals and Entities) — the official federal database of providers excluded from Medicare due to fraud convictions, license revocations, or program violations.
| Source | Rows | Purpose |
|---|---|---|
| CMS Part B Physician Claims | 1.26M | Billing volume, service counts, payments |
| CMS Part D Prescribing | 1.38M | Drug costs, opioid prescribing patterns |
| Open Payments (Sunshine Act) | 14.7M | Industry financial relationships |
| NPPES NPI Registry | 7.1M | Provider demographics, taxonomy codes |
| PECOS Enrollment | 2.54M | Medicare enrollment status |
| OIG LEIE Exclusion List | ~70K | Ground truth fraud labels |
All data stored in a single 6GB DuckDB database with 30 tables, enabling fast SQL queries during iteration.
| Version | AUC | Key Change | Result |
|---|---|---|---|
| Baseline | 0.5561 | Raw billing z-scores, global normalization | Near random |
| V2 | 0.7695 | Z-scores within specialty + LA opioid rate | +21 pts |
| V3 | 0.7433 | Added HCPCS concentration, taxonomy mismatch | -3 pts (noise) |
| V7 | 0.7904 | Ensemble: 50% max + 50% weighted mean | +2 pts |
| V12 | 0.8098 | Pure max(subscores) — no averaging | Best |
Taking max(subscores) across all features dramatically outperforms weighted averaging. Fraudulent providers don't need to be suspicious on every metric — being in the 99th percentile on one metric is enough signal.
The single strongest fraud signal. Excluded providers bill 5-50x more services per patient than peers in the same specialty. Global z-scores miss this because oncologists legitimately bill more than dermatologists.
LA opioid prescribing rate (normalized within specialty) catches pill mills that simple opioid rates miss. Anesthesiologists legitimately prescribe opioids — but LA opioids for chronic pain outside of pain management is a red flag.
We manually investigated top-scoring providers via web search to validate model outputs:
| Provider | Score | Finding | Status |
|---|---|---|---|
| Robert Morton, MD Psychiatrist, Ada OK NPI 1336222504 |
0.88 | $32.7M drug cost flagged. Actually prescribes ultra-expensive specialty psych drugs (Ingrezza $7,700/claim for tardive dyskinesia). 53 years experience at Rolling Hills Hospital. | Legitimate |
| Anne Greist, MD Hematologist, Indianapolis NPI 1063477073 |
Top | 48,875 services/bene flagged. IU Simon Cancer Center. Treats hemophilia patients with clotting factor infusions — legitimate academic center. | Legitimate |
| Kashif Ali, MD Oncologist, Greenbelt MD NPI 1851423388 |
0.85 | 629,920 services on 1,308 patients (481/patient). UM Capital Region Health. Even for infusion oncology, this is an extreme outlier. | Investigate |
| Harsha Vyas, MD Oncologist, Dublin GA NPI 1447447933 |
0.90 | 322,669 services on 764 patients (422/patient). CCMG Cancer Center. Affiliated with Fairview Park Hospital. | Investigate |
| 27777 Inkster Rd Cluster Farmington Hills, MI |
— | 15,110 providers at one address. Actually: Centria Healthcare ABA therapy corporate headquarters. Legitimate large employer. | Legitimate |
┌─────────────────────────────────────────────────────────────┐
│ eval.py (FIXED ORACLE) │
│ ├── Download OIG LEIE CSV (~70K exclusions) │
│ ├── Load CMS NPI universe from DuckDB (1.2M providers) │
│ ├── Run detector.py via subprocess (stdin/stdout CSV) │
│ ├── Compute AUC-ROC, Average Precision │
│ └── Append to results.tsv │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ detector.py (AGENT EDITS THIS) │
│ ├── Read NPI list from stdin │
│ ├── Query DuckDB for features: │
│ │ ├── Part B billing (services, payments, beneficiaries) │
│ │ ├── Part D prescribing (opioid rates, drug costs) │
│ │ ├── Open Payments (industry relationships) │
│ │ └── PECOS enrollment gaps │
│ ├── Compute subscores per feature │
│ ├── Combine via max(subscores) — not weighted average │
│ └── Output NPI,score CSV to stdout │
└─────────────────────────────────────────────────────────────┘
Built: March 8-9, 2026 in one evening session
Author: Blake Thomson + Chief (AI assistant)
Data: All CMS data is publicly available at data.cms.gov