Project: CMS Claims Fraud Detection using Karpathy's Autoresearch Pattern
Date: March 8, 2026
Status: Pre-Build Design Report โ Review Before Building
Author: Chief
This project adapts Andrej Karpathy's autoresearch pattern โ an autonomous overnight experiment loop โ to healthcare fraud detection. Instead of iterating on neural network training code, an AI agent iterates on fraud scoring logic, running automated evaluations against a ground-truth dataset of known fraudulent providers (the OIG LEIE exclusion list). You sleep, it runs 50โ100 experiments, and you wake up to a better fraud detector.
| autoresearch (ML training) | Our version (fraud detection) |
|---|---|
| train.py โ model architecture, optimizer, hyperparams (agent edits this) | detector.py โ scoring logic, SQL features, thresholds, weights (agent edits this) |
| program.md โ research instructions (human edits this) | strategy.md โ fraud hypotheses to test, what patterns to try (human edits this) |
| prepare.py โ fixed eval harness, data loading (never modified) | eval.py โ loads LEIE, joins CMS data, measures AUC-ROC (never modified) |
| val_bpb โ validation bits per byte (lower = better) | AUC-ROC โ area under ROC curve (higher = better, max 1.0) |
| 5-minute fixed GPU training run | 30โ60 second DuckDB eval against 90M rows |
| ~100 experiments overnight on one H100 | ~50โ100 experiments overnight on Hetzner CPU server |
| Git commit if val_bpb improves | Git commit if AUC-ROC improves |
| LOOP FOREVER until interrupted | LOOP FOREVER until interrupted |
Reference: github.com/karpathy/autoresearch โ cloned to ~/.openclaw/workspace/projects/autoresearch/
The List of Excluded Individuals/Entities (LEIE) is the OIG's public database of providers excluded from Medicare and Medicaid participation due to fraud, abuse, or program-related crimes. Updated monthly. ~70,000 current records.
Download URL: https://oig.hhs.gov/exclusions/downloadables/UPDATED.csv
Key LEIE fields: LASTNAME, FIRSTNAME, NPI, SPECIALTY, EXCLTYPE (exclusion category), EXCLDATE, REINDATE, WAIVERSTATE
Join strategy: NPI match (primary) โ name + specialty fallback (for pre-NPI records). Expected match rate to your CMS data: 30โ60% of LEIE records will have corresponding CMS activity.
This is the fixed file the agent never touches. It defines what "better" means.
Baseline AUC (random scoring): ~0.500. A good fraud detection model: 0.700โ0.850+. That's the improvement space the agent explores.
The agent modifies the fraud scoring logic. Everything is fair game:
| Feature Category | What the Agent Explores | Data Source |
|---|---|---|
| Billing outliers | Payment per beneficiary vs. specialty peers (z-scores, percentiles, thresholds) | Medicare Part B utilization |
| Opioid prescribing | Opioid rate, high-dose volume, CMS opioid flags, brand vs. generic rate | Part D prescriber data |
| Open Payments | Industry payment amounts, payment types (speaking fees, ownership), count of payers | Open Payments (Sunshine Act) |
| Geographic signals | Provider density by zip, patient travel distance, hot zone overlap | NPPES addresses |
| Temporal patterns | Year-over-year billing growth, service mix shifts, new specialty codes | Multi-year utilization data |
| Score composition | Feature weights, normalization method, composite formula, rank cutoffs | Any/all of the above |
Expected throughput: 30โ60 sec per eval on Hetzner โ ~60โ120 experiments overnight.
All data lives on Hetzner 5.78.148.70, DuckDB at /home/dataops/cms-data/data/provider_searcher.duckdb.
| Dataset | Records | Fraud Signals |
|---|---|---|
| NPPES Provider Registry | 8M providers | Address anomalies, taxonomy mismatches |
| Medicare Part B Utilization | ~10M records | Billing outliers, volume anomalies, peer comparison |
| Medicare Part D Prescriber | ~10M records | Opioid flags, brand preference, pill mill patterns |
| Open Payments (General) | 14.7M transactions | Industry payments, ownership interests, kickback proxies |
| Open Payments (Research) | 1.1M transactions | Research funding conflicts |
| Doctors & Clinicians | 2.7M national | Practice patterns, group affiliations |
| Facility Affiliations | 1.6M records | Referral network construction, ownership ties |
| MIPS Performance | 541K records | Quality score outliers, low performers |
| OIG LEIE (to add) | ~70K records | Ground truth labels โ known bad actors |
Total DB: ~5.5GB, 90M+ rows, 30 tables. Source: CMS public use files + Open Payments + NPPES.
This project has a double life: it's a real working tool and a marketing asset.
Most healthcare analytics shops write a whitepaper about fraud detection. We actually ran it โ 100 automated experiments overnight, iterating on our algorithm until the model stopped improving. Here's the git log. Here's the AUC curve. Here's what features moved the needle.
That's a story no consultant has told before. It positions you as a domain expert who understands both the clinical context and modern ML workflow patterns.
| Page | Content | Lead Hook |
|---|---|---|
| /projects/fraud-analysis/ | Overview โ problem, our approach, key findings | CTA: "We build custom FWA detection for health plans" |
| /projects/fraud-analysis/methodology | Deep dive โ how autoresearch loop works, features tried, what worked | Technical credibility, developer leads |
| /projects/fraud-analysis/findings | What the model actually found โ top-scoring providers by category | Domain expertise demonstration |
| Whitepaper (gated) | Full report โ methodology + code + findings. Email required to download. | Email capture โ lead nurture sequence |
| Step | Task | Status |
|---|---|---|
| 1 | Restore SSH access to Hetzner CMS server (5.78.148.70) | Needs Blake |
| 2 | Download LEIE CSV, load into DuckDB, join to provider tables, verify label coverage | Chief โ once SSH works |
| 3 | Build eval.py โ fixed harness, AUC-ROC measurement, test it runs clean | Chief |
| 4 | Build baseline detector.py โ simple billing outlier score, establish baseline AUC | Chief |
| 5 | Write strategy.md โ initial hypotheses, what patterns to try first | Chief (with Blake input) |
| 6 | Kick off overnight run โ Claude Code in autoresearch loop on Hetzner | Tonight |
| 7 | Morning review โ read results.tsv, review what worked, update strategy.md | Blake + Chief |
| 8 | Build website pages from findings โ methodology, case studies, interactive explorer | Week 2 |
The autoresearch connection is the hook for an article. Proposed title:
"We Ran Karpathy's Autoresearch on Healthcare Fraud Data โ Here's What the Algorithm Found"
Structure:
Distribution: LinkedIn (Blake's profile + company page), healthcaredataai.com blog, possibly HackerNews or Towards Data Science.
Report generated by Chief ยท March 8, 2026 ยท Based on Karpathy autoresearch repo (cloned to ~/.openclaw/workspace/projects/autoresearch/) and existing fraud analysis project (~/.openclaw/workspace/projects/fraud-analysis/)