FHIR R4 Export Anatomy, Patient Complexity Spectrum & Agent Use Cases
Date: March 11, 2026 | Context: EHIgnite Challenge ($490K HHS) β Phase 1 due May 13, 2026
Data source: SMART on FHIR Synthea-generated bulk export (10 patients) + individual FHIR bundles (~500 patients)
Two datasets from SMART on FHIR (MIT/Boston Children's Hospital), generated by Synthea (MIT's synthetic patient simulator):
| Dataset | Format | Patients | Size | Source |
|---|---|---|---|---|
| Bulk NDJSON Export | One .ndjson file per resource type | 10 patients (13 entries incl. deceased) | 2.5MB zip / ~23MB unzipped | github.com/smart-on-fhir/sample-bulk-fhir-datasets |
| Individual Patient Bundles | One JSON file per patient (FHIR Bundle) | ~500 patients | 81MB zip / ~400MB unzipped | synthetichealth.github.io/synthea-sample-data |
Total: ~21,000 records across just 10 patients. Observation alone is 47% of all data. Labs and vitals are the dominant data type β this matters enormously for what the ingestion pipeline needs to handle.
The most important design insight from the data: "a patient" means wildly different things. Any tool needs to handle this entire range.
Profile: Mariette443 Hackett68. No chronic conditions. No medications. Four routine encounters (annual physicals). Labs within normal range. Immunizations up to date.
What her EHI export contains: 35 Observations (routine labs + vitals), 12 Immunizations, 4 Encounters, 4 Claims, 4 EOBs, 2 Procedures, 1 DiagnosticReport.
For an agent: Almost usable as-is. The aggregation layer does most of the work. Main value: multi-system stitching (her dentist is at a different system), preventive care gap alerts.
Represents: ~30% of the adult patient population. High lifetime value for prevention-focused use cases.
Profile: Steven797 Fadel536. 5 conditions: hypertension, obesity, seasonal allergies, viral sinusitis, BMI 30+. 3 medications: Hydrochlorothiazide (BP), Fexofenadine (allergies), Epi-pen. 9 encounters. Insured by Blue Cross Blue Shield. 12 claims in history.
Sample billing data:
For an agent: Network verification ("is my BP doctor in-network?"), drug interaction check, care gap identification (obese patient β should have annual lipid panel), benefit utilization review.
Represents: The most common patient in primary care. ~50% of adults have at least one chronic condition.
Profile: Marine542 Ai120 Upton904. Born 1927, died 1994. 708 encounters. 219 documented conditions across 67 years of life. The disease trajectory tells a story:
Medications: Tacrolimus (transplant immunosuppressant), insulin Humulin 70/30, metformin, metoprolol, lisinopril, simvastatin, Epoetin Alfa (anemia of CKD), nitroglycerin PRN.
Why this patient matters: Raw data is completely unusable. 219 conditions shown as a flat list is meaningless. A new provider who doesn't realize she had a kidney transplant could prescribe NSAIDs (contraindicated with tacrolimus) and cause organ rejection. The agent has to understand the clinical narrative, not just the data dump.
Represents: Top 5% of patients who drive ~50% of healthcare costs. The highest-value use case for AI assistance.
Here's what a patient's Condition resource looks like in raw FHIR β one of 219 for our complex patient:
{
"resourceType": "Condition",
"id": "a3b2c1d0-...",
"subject": { "reference": "Patient/79a66c97-..." },
"code": {
"coding": [{
"system": "http://snomed.info/sct",
"code": "44054006",
"display": "Diabetes mellitus type 2 (disorder)"
}]
},
"onsetDateTime": "1969-05-31",
"clinicalStatus": {
"coding": [{ "code": "active" }]
},
"category": [{
"coding": [{ "code": "encounter-diagnosis" }]
}]
}
And a MedicationRequest β one of 1,036 for this same patient:
{
"resourceType": "MedicationRequest",
"id": "b4c3d2e1-...",
"subject": { "reference": "Patient/79a66c97-..." },
"medicationCodeableConcept": {
"coding": [{
"system": "http://www.nlm.nih.gov/research/umls/rxnorm",
"code": "1860487",
"display": "24 HR tacrolimus 1 MG Extended Release Oral Tablet"
}]
},
"status": "active",
"intent": "order",
"authoredOn": "1988-09-15"
}
And here's what a Claim looks like (billing data β this is critical for agent use cases):
{
"resourceType": "Claim",
"id": "263e918b-...",
"status": "active",
"type": { "coding": [{ "code": "institutional" }] },
"created": "1996-11-10",
"patient": { "reference": "Patient/..." },
"insurance": [{ "coverage": { "reference": "Coverage/..." } }],
"item": [{
"sequence": 1,
"productOrService": {
"coding": [{
"system": "http://snomed.info/sct",
"display": "Encounter for symptom"
}]
}
}],
"total": { "value": 27.18, "currency": "USD" }
}
The key architectural insight: build the aggregator first, then spawn agents from it.
Once a patient's EHI is parsed, normalized, and summarized into a structured workspace, specialized agents can be launched against that workspace without re-parsing the raw FHIR data each time. This is the same pattern we use in enterprise software β a persistent context that agents read from and write to.
The problem: Drugs for rare diseases cost $50Kβ$500K/year. Manufacturer patient assistance programs, foundation grants, NIH programs, and state-level assistance exist but patients don't know about them β and their doctors don't have time to research it either.
How it works:
Example (based on our complex patient): Marine542 is on tacrolimus (post-transplant). A search finds: Astellas PAP (free drug if income <400% FPL), National Kidney Foundation emergency grants, transplant pharmacy assistance programs, and Medicare Part D Extra Help eligibility.
Why this wins the competition: Most emotionally resonant use case. Judges who are clinicians will immediately recognize this as something patients desperately need. No other team is likely thinking about it β most will pitch the obvious Q&A chatbot.
The problem: Most patients never look at their EOBs. Denied claims go uncontested. Overbilling goes unchallenged. Unused benefits expire. The EHI export contains the full billing history β but nobody has built a tool to make it useful.
Three sub-use cases:
Data note: Synthea individual bundles include Claim and ExplanationOfBenefit resources. The EOBs show insurer (Blue Cross Blue Shield in our sample), dates, claim amounts, and payment status. Real EOBs would also include denial reason codes, contracted rates, and patient responsibility breakdowns.
The problem: "Is this doctor in my network?" is one of the most common questions patients can't answer. A $40 copay visit becomes a $3,000 bill when the answer turns out to be no. The EHI export has the patient's Coverage (insurance plan) and CareTeam (current providers with NPIs). The missing piece: querying the insurer's provider directory.
How it works:
The moat: We already have 2.7M providers in CMS DuckDB on Hetzner, with specialty, location, and network data. The NPI matching engine is built. This agent could be live in days, not months.
Provider-side view: Flip the perspective β a physician liaison tool that tells a provider "Patient X has UnitedHealthcare Choice Plus. Here's what you can do in-network for them, and here's what needs prior auth."
The problem: Getting a second opinion requires sending your records somewhere. What do you send? The full 500-page printout? The cryptic FHIR ZIP? A vague "my doctor said I have X"? None of these work.
How it works:
Clinician-facing angle: The receiving physician uploads the incoming patient's EHI. The agent surfaces the clinically critical information immediately: active conditions, allergies, current meds, implanted devices, relevant history for this specialty. No chart digging required.
The problem: Care transitions are where medication errors happen. A hospital discharges a patient with new meds. They see their cardiologist next week. The cardiologist doesn't have the updated med list. They prescribe something that interacts. This is entirely preventable with access to the full medication history.
Real example from our dataset: Marine542 is on tacrolimus (transplant immunosuppressant). This drug has severe interactions with azithromycin (common antibiotic), fluconazole (common antifungal), and NSAIDs. A new urgent care provider who doesn't see the transplant history prescribes one of these β and causes organ rejection. This happens in real life.
What the agent does:
The problem: Preventive screenings are chronically under-delivered. A diabetic patient should get annual retinal exams, foot exams, nephrology check-ins, A1C every 90 days. A 55-year-old should have a colonoscopy. The data to know who's overdue exists β but no one looks at it systematically from the patient's perspective.
What the agent does:
Provider-side value: A physician liaison using this tool can identify which patients haven't been seen for chronic condition management β and reach out proactively. This is a retention and quality metrics play for the practice.
The problem: A patient's health records are scattered across 3β8 different portals β their PCP, cardiologist, hospital system, urgent care, pharmacy, insurer. Before any agent can help, someone has to go get the data. Asking patients to manually download-and-upload each one is a non-starter for real adoption. The platform needs to go get it autonomously.
Why it's challenging: This sits at the intersection of HIPAA, live authentication, and web automation. But the 21st Century Cures Act actually created the legal infrastructure for exactly this. The challenge is engineering, not regulatory.
| Tier | Mechanism | Market Coverage | Maturity |
|---|---|---|---|
| 1 β SMART on FHIR API | Patient OAuth authorization β access token β API calls | ~60% of U.S. hospitals (Epic 35%, Cerner 25%+) | Works today |
| 2 β Browser Agent | Patient-authorized Playwright automation β portal login β EHI export ZIP download | ~40% (smaller hospitals, specialty clinics, urgent cares) | Engineering challenge |
| 3 β Government APIs | Blue Button 2.0 (Medicare), VA FHIR API, state Medicaid | 65M+ Medicare, 9M+ VA patients | Works today |
Patient clicks "Connect [Hospital Name]" β OAuth redirect β authorizes β agent gets a refresh token stored encrypted in their own Key Vault. Agent then calls standard FHIR R4 endpoints to pull all resource types. Syncs on schedule. No PHI leaves the patient's environment after retrieval. This is exactly how Apple Health works. Directly satisfies the EHIgnite multi-EHR interoperability bonus.
The ONC mandate requires every certified EHR to have an EHI export button in the patient portal. So the data is always accessible β just not via API. The agent:
Novel research angle for the competition: Instead of brittle CSS selectors that break when portals redesign, use a vision model (multimodal LLM) to identify the "Download My Records" button on any portal UI. "Zero-shot portal automation via multimodal LLM navigation" β no one else is proposing this.
Blue Button 2.0 (CMS) covers every Medicare claim β every hospital visit, every procedure, every Part D prescription for anyone 65+. For complex chronic patients (our highest-value use case), this is the most complete claims dataset that exists. Full FHIR structured data. Patient authorizes via standard OAuth in under a minute.
| Phase | What We Build | Timeline | Why First |
|---|---|---|---|
| Foundation | FHIR parser + normalization. Accept ZIP, extract all resource types, build patient workspace object. Support both Bundle JSON and bulk NDJSON. | 1β2 weeks | Everything depends on this. Synthea data available now for development. |
| Aggregator | Multi-EHR stitching. Entity resolution (same patient, different system IDs). Deduplication. Summary generation. Patient workspace markdown files. | 2β3 weeks | Required for the "integration" EHIgnite scenario + interoperability bonus. |
| Q&A Agent | LLM agent reads patient workspace. Answers questions in plain language. RAG over structured FHIR data. | 1 week | This is the required EHIgnite demo. Also the natural entry point for all other agents. |
| Network Navigator | Coverage extraction + NPI lookup + provider directory query. We already have the CMS provider data. | 3β5 days | Fastest to build given existing CMS DuckDB infrastructure. High user value. |
| Claims Agent | EOB parsing, denial detection, benefit utilization review. | 1β2 weeks | Unique angle for the competition. Directly addresses payer workflow scenario. |
| Funding Finder | Rare disease identification + web search + PAP database integration. | 1 week | Most differentiated use case. Web search pattern already proven. |
The privacy/security judging criterion (one of 5 categories) is a major differentiator opportunity. Most competitors will build a SaaS where patient data goes to their server. The winning play:
If you're a clinician reviewing this, here's what would most sharpen the product direction:
Data sourced from: SMART on FHIR sample-bulk-fhir-datasets (MIT/Boston Children's Hospital, CC0 license) Β· Synthea synthetic patient generator (MITRE Corporation, Apache 2.0) Β· HHS/ONC EHIgnite Challenge (ehignitechallenge.org) Β· HL7 FHIR R4 specification Β· ONC 21st Century Cures Act Final Rule Β§170.315(b)(10)
Analysis and report prepared by Chief β March 11, 2026