EHI Patient Data Deep Dive

FHIR R4 Export Anatomy, Patient Complexity Spectrum & Agent Use Cases

Date: March 11, 2026  |  Context: EHIgnite Challenge ($490K HHS) β€” Phase 1 due May 13, 2026

Data source: SMART on FHIR Synthea-generated bulk export (10 patients) + individual FHIR bundles (~500 patients)

This report documents actual analysis of downloaded FHIR R4 datasets. Resource counts and patient profiles are drawn from real synthetic data. The goal: understand what the data looks like before deciding what to build.

1. What We Actually Downloaded & Analyzed

Two datasets from SMART on FHIR (MIT/Boston Children's Hospital), generated by Synthea (MIT's synthetic patient simulator):

DatasetFormatPatientsSizeSource
Bulk NDJSON Export One .ndjson file per resource type 10 patients (13 entries incl. deceased) 2.5MB zip / ~23MB unzipped github.com/smart-on-fhir/sample-bulk-fhir-datasets
Individual Patient Bundles One JSON file per patient (FHIR Bundle) ~500 patients 81MB zip / ~400MB unzipped synthetichealth.github.io/synthea-sample-data

Resource Counts: What's Actually in a 10-Patient Export

Observation9,878 records
DiagnosticReport2,101 records
Procedure2,056 records
DocumentReference1,215 records (clinical notes)
Encounter1,215 records
MedicationRequest1,745 records
Condition555 records
Immunization161 records
AllergyIntolerance11 records

Total: ~21,000 records across just 10 patients. Observation alone is 47% of all data. Labs and vitals are the dominant data type β€” this matters enormously for what the ingestion pipeline needs to handle.


2. The Patient Complexity Spectrum β€” Three Archetypes

The most important design insight from the data: "a patient" means wildly different things. Any tool needs to handle this entire range.

🟒 Archetype A: Healthy Young Adult β€” 65 Resources

Profile: Mariette443 Hackett68. No chronic conditions. No medications. Four routine encounters (annual physicals). Labs within normal range. Immunizations up to date.

What her EHI export contains: 35 Observations (routine labs + vitals), 12 Immunizations, 4 Encounters, 4 Claims, 4 EOBs, 2 Procedures, 1 DiagnosticReport.

For an agent: Almost usable as-is. The aggregation layer does most of the work. Main value: multi-system stitching (her dentist is at a different system), preventive care gap alerts.

Represents: ~30% of the adult patient population. High lifetime value for prevention-focused use cases.

🟑 Archetype B: Managed Chronic Patient β€” 98 Resources

Profile: Steven797 Fadel536. 5 conditions: hypertension, obesity, seasonal allergies, viral sinusitis, BMI 30+. 3 medications: Hydrochlorothiazide (BP), Fexofenadine (allergies), Epi-pen. 9 encounters. Insured by Blue Cross Blue Shield. 12 claims in history.

Sample billing data:

For an agent: Network verification ("is my BP doctor in-network?"), drug interaction check, care gap identification (obese patient β†’ should have annual lipid panel), benefit utilization review.

Represents: The most common patient in primary care. ~50% of adults have at least one chronic condition.

πŸ”΄ Archetype C: Complex Multi-System Patient β€” 5,518+ Resources

Profile: Marine542 Ai120 Upton904. Born 1927, died 1994. 708 encounters. 219 documented conditions across 67 years of life. The disease trajectory tells a story:

Medications: Tacrolimus (transplant immunosuppressant), insulin Humulin 70/30, metformin, metoprolol, lisinopril, simvastatin, Epoetin Alfa (anemia of CKD), nitroglycerin PRN.

Why this patient matters: Raw data is completely unusable. 219 conditions shown as a flat list is meaningless. A new provider who doesn't realize she had a kidney transplant could prescribe NSAIDs (contraindicated with tacrolimus) and cause organ rejection. The agent has to understand the clinical narrative, not just the data dump.

Represents: Top 5% of patients who drive ~50% of healthcare costs. The highest-value use case for AI assistance.


3. What the Raw Data Actually Looks Like (The Problem)

Here's what a patient's Condition resource looks like in raw FHIR β€” one of 219 for our complex patient:

{
  "resourceType": "Condition",
  "id": "a3b2c1d0-...",
  "subject": { "reference": "Patient/79a66c97-..." },
  "code": {
    "coding": [{
      "system": "http://snomed.info/sct",
      "code": "44054006",
      "display": "Diabetes mellitus type 2 (disorder)"
    }]
  },
  "onsetDateTime": "1969-05-31",
  "clinicalStatus": {
    "coding": [{ "code": "active" }]
  },
  "category": [{
    "coding": [{ "code": "encounter-diagnosis" }]
  }]
}

And a MedicationRequest β€” one of 1,036 for this same patient:

{
  "resourceType": "MedicationRequest",
  "id": "b4c3d2e1-...",
  "subject": { "reference": "Patient/79a66c97-..." },
  "medicationCodeableConcept": {
    "coding": [{
      "system": "http://www.nlm.nih.gov/research/umls/rxnorm",
      "code": "1860487",
      "display": "24 HR tacrolimus 1 MG Extended Release Oral Tablet"
    }]
  },
  "status": "active",
  "intent": "order",
  "authoredOn": "1988-09-15"
}

And here's what a Claim looks like (billing data β€” this is critical for agent use cases):

{
  "resourceType": "Claim",
  "id": "263e918b-...",
  "status": "active",
  "type": { "coding": [{ "code": "institutional" }] },
  "created": "1996-11-10",
  "patient": { "reference": "Patient/..." },
  "insurance": [{ "coverage": { "reference": "Coverage/..." } }],
  "item": [{
    "sequence": 1,
    "productOrService": {
      "coding": [{
        "system": "http://snomed.info/sct",
        "display": "Encounter for symptom"
      }]
    }
  }],
  "total": { "value": 27.18, "currency": "USD" }
}
The core problem in concrete terms: A 65-year-old with 30 years of health records has roughly 15,000–25,000 FHIR resources in their export. Dropping that into any tool that can't parse and synthesize FHIR produces nothing usable. The coding systems alone β€” SNOMED, LOINC, RxNorm, ICD-10, CPT, CVX β€” require active interpretation. "44054006" is meaningless without knowing it's SNOMED for Type 2 Diabetes.

4. The Agent Platform Vision

The key architectural insight: build the aggregator first, then spawn agents from it.

Once a patient's EHI is parsed, normalized, and summarized into a structured workspace, specialized agents can be launched against that workspace without re-parsing the raw FHIR data each time. This is the same pattern we use in enterprise software β€” a persistent context that agents read from and write to.

Platform Architecture
FHIR ZIPs
Epic / Cerner / Multiple EHRs
β†’
Aggregator
Parse + Normalize + Deduplicate + Summarize
β†’
Patient Workspace
Structured context + summary
↓
Q&A Agent
"What meds am I on?"
Claims Agent
Billing review
Network Agent
In-network check
Funding Agent
Rare disease grants
Summary Agent
Second opinion prep

5. The Six Agent Use Cases

πŸ’Š Agent 1 β€” Rare Disease Funding Finder

High Differentiator EHIgnite: Patient Tools Data: Conditions + Meds + Coverage

The problem: Drugs for rare diseases cost $50K–$500K/year. Manufacturer patient assistance programs, foundation grants, NIH programs, and state-level assistance exist but patients don't know about them β€” and their doctors don't have time to research it either.

How it works:

  1. Read patient's Condition resources β†’ identify rare/orphan diseases (NORD database cross-reference)
  2. Read MedicationRequest β†’ identify expensive specialty drugs
  3. Read Coverage β†’ understand current insurance status (eligibility thresholds vary by program)
  4. Web search: manufacturer PAPs, NeedyMeds database, foundation grants (American Kidney Fund, NORD, disease-specific foundations), 340B program eligibility
  5. Return: ranked list of programs with eligibility criteria, dollar values, and application links

Example (based on our complex patient): Marine542 is on tacrolimus (post-transplant). A search finds: Astellas PAP (free drug if income <400% FPL), National Kidney Foundation emergency grants, transplant pharmacy assistance programs, and Medicare Part D Extra Help eligibility.

Why this wins the competition: Most emotionally resonant use case. Judges who are clinicians will immediately recognize this as something patients desperately need. No other team is likely thinking about it β€” most will pitch the obvious Q&A chatbot.

πŸ“‹ Agent 2 β€” Claims & Billing Intelligence

EHIgnite: Payer Workflow Data: Claim + EOB + Coverage + Encounter Unique: Claims data in EHI is often overlooked

The problem: Most patients never look at their EOBs. Denied claims go uncontested. Overbilling goes unchallenged. Unused benefits expire. The EHI export contains the full billing history β€” but nobody has built a tool to make it useful.

Three sub-use cases:

Data note: Synthea individual bundles include Claim and ExplanationOfBenefit resources. The EOBs show insurer (Blue Cross Blue Shield in our sample), dates, claim amounts, and payment status. Real EOBs would also include denial reason codes, contracted rates, and patient responsibility breakdowns.

πŸ—ΊοΈ Agent 3 β€” Payer-Provider Network Navigator

EHIgnite: Patient Tools + Integration Data: Coverage + Practitioner (NPI) + Patient zip Pain Point: Universal and deeply felt

The problem: "Is this doctor in my network?" is one of the most common questions patients can't answer. A $40 copay visit becomes a $3,000 bill when the answer turns out to be no. The EHI export has the patient's Coverage (insurance plan) and CareTeam (current providers with NPIs). The missing piece: querying the insurer's provider directory.

How it works:

  1. Read Coverage resource β†’ extract plan name, insurer, member ID, plan type (HMO/PPO/EPO/HDHP)
  2. Read CareTeam β†’ extract NPIs of current providers
  3. When patient asks about a new provider: look up their NPI via NPPES, then query insurer's provider directory (CMS mandates public API access for large insurers under FHIR requirements)
  4. Return: in-network / out-of-network / need PCP referral / need prior auth
  5. Bonus: for HMO/EPO patients, surface which in-network specialists are available for their condition

The moat: We already have 2.7M providers in CMS DuckDB on Hetzner, with specialty, location, and network data. The NPI matching engine is built. This agent could be live in days, not months.

Provider-side view: Flip the perspective β€” a physician liaison tool that tells a provider "Patient X has UnitedHealthcare Choice Plus. Here's what you can do in-network for them, and here's what needs prior auth."

🩺 Agent 4 β€” Second Opinion Prep Package

EHIgnite: Summarization + Domain Filtering Data: All clinical resources

The problem: Getting a second opinion requires sending your records somewhere. What do you send? The full 500-page printout? The cryptic FHIR ZIP? A vague "my doctor said I have X"? None of these work.

How it works:

  1. Patient uploads their EHI export and answers: "What's the second opinion for? What specialty?"
  2. Agent generates a specialty-filtered clinical brief β€” relevant conditions, labs, meds, procedures β€” filtered by specialty-appropriate SNOMED/LOINC/CPT codes
  3. Produces a structured summary the receiving physician can actually read in 5 minutes
  4. Generates a cover letter explaining the clinical question

Clinician-facing angle: The receiving physician uploads the incoming patient's EHI. The agent surfaces the clinically critical information immediately: active conditions, allergies, current meds, implanted devices, relevant history for this specialty. No chart digging required.

πŸ’Š Agent 5 β€” Medication Reconciliation & Safety

High Risk: Patient Safety EHIgnite: Summarization + Patient Tools Data: MedicationRequest + Condition + AllergyIntolerance

The problem: Care transitions are where medication errors happen. A hospital discharges a patient with new meds. They see their cardiologist next week. The cardiologist doesn't have the updated med list. They prescribe something that interacts. This is entirely preventable with access to the full medication history.

Real example from our dataset: Marine542 is on tacrolimus (transplant immunosuppressant). This drug has severe interactions with azithromycin (common antibiotic), fluconazole (common antifungal), and NSAIDs. A new urgent care provider who doesn't see the transplant history prescribes one of these β€” and causes organ rejection. This happens in real life.

What the agent does:

  1. Extract all active MedicationRequest resources across all EHR sources in the export
  2. Deduplicate (same drug from different systems, different dosages)
  3. Run drug-drug interaction check (OpenFDA API)
  4. Flag: duplicates, interactions, medications contraindicated by active conditions
  5. Generate: reconciled medication list + patient wallet card + provider-facing safety brief

πŸ“… Agent 6 β€” Care Gap & Preventive Advisor

EHIgnite: Summarization + Patient Tools Data: Condition + Observation + Procedure + Patient demographics

The problem: Preventive screenings are chronically under-delivered. A diabetic patient should get annual retinal exams, foot exams, nephrology check-ins, A1C every 90 days. A 55-year-old should have a colonoscopy. The data to know who's overdue exists β€” but no one looks at it systematically from the patient's perspective.

What the agent does:

  1. Read conditions, age, gender β†’ determine what screenings are clinically indicated (USPSTF guidelines)
  2. Scan Observation, Procedure, Encounter history β†’ determine when each was last done
  3. Output: "You are overdue for these screenings" with specific dates and scheduling guidance
  4. Multi-system bonus: deduplicate across EHR sources so patients don't repeat tests unnecessarily

Provider-side value: A physician liaison using this tool can identify which patients haven't been seen for chronic condition management β€” and reach out proactively. This is a retention and quality metrics play for the practice.

🌐 Agent 7 β€” Autonomous Web Data Collector Hardest + Most Differentiated

Core Enabler β€” Everything Else Depends on This HIPAA: Patient-Authorized EHIgnite: Multi-EHR Interoperability Bonus

The problem: A patient's health records are scattered across 3–8 different portals β€” their PCP, cardiologist, hospital system, urgent care, pharmacy, insurer. Before any agent can help, someone has to go get the data. Asking patients to manually download-and-upload each one is a non-starter for real adoption. The platform needs to go get it autonomously.

Why it's challenging: This sits at the intersection of HIPAA, live authentication, and web automation. But the 21st Century Cures Act actually created the legal infrastructure for exactly this. The challenge is engineering, not regulatory.

Three-Tier Architecture

Tier Mechanism Market Coverage Maturity
1 β€” SMART on FHIR API Patient OAuth authorization β†’ access token β†’ API calls ~60% of U.S. hospitals (Epic 35%, Cerner 25%+) Works today
2 β€” Browser Agent Patient-authorized Playwright automation β†’ portal login β†’ EHI export ZIP download ~40% (smaller hospitals, specialty clinics, urgent cares) Engineering challenge
3 β€” Government APIs Blue Button 2.0 (Medicare), VA FHIR API, state Medicaid 65M+ Medicare, 9M+ VA patients Works today

Tier 1 β€” SMART on FHIR (Clean & Scalable)

Patient clicks "Connect [Hospital Name]" β†’ OAuth redirect β†’ authorizes β†’ agent gets a refresh token stored encrypted in their own Key Vault. Agent then calls standard FHIR R4 endpoints to pull all resource types. Syncs on schedule. No PHI leaves the patient's environment after retrieval. This is exactly how Apple Health works. Directly satisfies the EHIgnite multi-EHR interoperability bonus.

Tier 2 β€” Patient-Authorized Browser Agent (The Hard One)

The ONC mandate requires every certified EHR to have an EHI export button in the patient portal. So the data is always accessible β€” just not via API. The agent:

  1. Patient provides credentials once β†’ stored encrypted in their own Azure Key Vault (never on our servers)
  2. Agent launches headless browser (Playwright) inside the patient's environment
  3. Logs into each portal, downloads EHI export ZIP, ingests into patient workspace
  4. For MFA β€” agent pauses, asks patient to enter code (human-in-the-loop, one-time)

Novel research angle for the competition: Instead of brittle CSS selectors that break when portals redesign, use a vision model (multimodal LLM) to identify the "Download My Records" button on any portal UI. "Zero-shot portal automation via multimodal LLM navigation" β€” no one else is proposing this.

Tier 3 β€” Government APIs (Easy, Often Overlooked)

Blue Button 2.0 (CMS) covers every Medicare claim β€” every hospital visit, every procedure, every Part D prescription for anyone 65+. For complex chronic patients (our highest-value use case), this is the most complete claims dataset that exists. Full FHIR structured data. Patient authorizes via standard OAuth in under a minute.

The HIPAA framing that makes this work: The agent is a tool running in the patient's own Azure tenant, operating on the patient's own data, on the patient's own behalf. That's not a healthcare vendor handling PHI β€” that's a patient using a tool to manage their own records. No BAA needed. No data transit risk. No vendor security assessment. The 21st Century Cures Act gave patients the right to share their data with any app they authorize. We're just making that authorization seamless.

6. Technical Build Path

Phase What We Build Timeline Why First
Foundation FHIR parser + normalization. Accept ZIP, extract all resource types, build patient workspace object. Support both Bundle JSON and bulk NDJSON. 1–2 weeks Everything depends on this. Synthea data available now for development.
Aggregator Multi-EHR stitching. Entity resolution (same patient, different system IDs). Deduplication. Summary generation. Patient workspace markdown files. 2–3 weeks Required for the "integration" EHIgnite scenario + interoperability bonus.
Q&A Agent LLM agent reads patient workspace. Answers questions in plain language. RAG over structured FHIR data. 1 week This is the required EHIgnite demo. Also the natural entry point for all other agents.
Network Navigator Coverage extraction + NPI lookup + provider directory query. We already have the CMS provider data. 3–5 days Fastest to build given existing CMS DuckDB infrastructure. High user value.
Claims Agent EOB parsing, denial detection, benefit utilization review. 1–2 weeks Unique angle for the competition. Directly addresses payer workflow scenario.
Funding Finder Rare disease identification + web search + PAP database integration. 1 week Most differentiated use case. Web search pattern already proven.

Deployment Architecture

The privacy/security judging criterion (one of 5 categories) is a major differentiator opportunity. Most competitors will build a SaaS where patient data goes to their server. The winning play:

Deploy inside the patient's (or provider's) Azure environment. FHIR parsing, LLM processing, and all agent outputs happen in their tenant. No PHI ever leaves their environment. This is HIPAA-compliant by architecture, not by contract. For enterprise hospital clients, this is the only answer β€” it eliminates the BAA negotiation entirely.

7. Questions for Clinical Input

If you're a clinician reviewing this, here's what would most sharpen the product direction:

  1. First encounter with a new patient: What's the one thing you wish you had in the first 60 seconds that you often don't?
  2. Records from another system: When a patient brings records from another hospital, what do you actually look at first? What do you ignore?
  3. Medication reconciliation: What's the most dangerous med-list situation you encounter? Which drugs are the highest-risk when records are incomplete?
  4. The billing world: Do your patients ever ask you about denied claims? Do you help them navigate that, or is it outside your scope?
  5. Rare disease / expensive meds: Do you have a system for connecting patients to assistance programs? If yes, what? If no, what happens?
  6. Preventive care gaps: In your patient population, which screenings are most chronically missed β€” and is it a patient behavior issue or a "nobody told them" issue?
  7. The network question: How often do patients come back and say "I got a surprise bill because that specialist wasn't in-network"? Is there anything you can do about it currently?

Data sourced from: SMART on FHIR sample-bulk-fhir-datasets (MIT/Boston Children's Hospital, CC0 license) Β· Synthea synthetic patient generator (MITRE Corporation, Apache 2.0) Β· HHS/ONC EHIgnite Challenge (ehignitechallenge.org) Β· HL7 FHIR R4 specification Β· ONC 21st Century Cures Act Final Rule Β§170.315(b)(10)

Analysis and report prepared by Chief β€” March 11, 2026