Building Snappy: The Developer's Guide to Web Performance

1

The Rendering Pipeline

How Browsers Actually Work

Every user interaction triggers a cascade of work on the browser's main thread. Understanding this pipeline is the foundation of all performance thinking.

The Main Thread Pipeline

When something changes in the UI, the browser runs through up to five stages — all on a single thread. If any stage takes too long, the user sees jank.

JS Execute

→

Style Calc

→

Layout

→

Paint

→

Composite

JS Execute: Your event handlers, data transforms, framework render cycles
Style Calculation: Browser figures out which CSS rules apply to which elements
Layout (Reflow): Browser calculates geometry — position, size, spacing for every affected element
Paint: Fill in pixels — backgrounds, text, borders, shadows
Composite: Combine painted layers (the cheap step — GPU-accelerated)

The Frame Budget

At 60fps, you have 16.67ms per frame. At 120fps (iPhone ProMotion, iPad), it's 8.33ms. Subtract ~4ms for browser housekeeping, and you're left with:

~12ms

Your budget @ 60fps

~4ms

Your budget @ 120fps

⚠

Layout Thrashing is the #1 killer of frame budgets. It happens when you interleave DOM reads and writes in a loop — each read forces the browser to recalculate layout before it can answer your query.

Interactive Demo: Layout Thrashing

This demo creates 200 boxes, then reads and writes their dimensions. Watch how interleaved reads/writes (bad) takes dramatically longer than batched operations (good).

Click a button to run the demo...

💡

Key Insight: Every DOM geometry query you make inside a loop (offsetWidth, getBoundingClientRect, scrollTop) is potentially forcing a synchronous layout recalculation. Batch your reads, then batch your writes.

2

Rendering Models

DOM vs Canvas vs WebGL

There are three fundamentally different ways to put pixels on screen. Choosing the right one is the single biggest architecture decision for a data-rich app.

	DOM	Canvas 2D	WebGL
Best for	Interactive UI, forms, text	Custom 2D graphics, charts	Maps, 3D, millions of points
Layout engine	Browser handles it	You handle it	You handle it
Perf ceiling	Medium	High	Very High
Dev complexity	Low	Medium	High
Accessibility	Built-in	Manual	Manual
Examples	React apps, forms, tables	Chart.js, Pretext, timelines	Mapbox, Deck.gl, Figma

When to use each

DOM — default choice. Use it when you need forms, text selection, links, accessibility, SEO. Falls apart above ~1,000 frequently-updating elements.
Canvas 2D — pixel-perfect rendering with no reflow. You own hit-testing, text selection, and accessibility. Great for charts, custom timelines, and data-dense visualizations.
WebGL — GPU-accelerated. Handles millions of primitives at 60fps. Every major map (Mapbox, Google Maps) and advanced visualization (Figma, Deck.gl) runs on WebGL.

💡

Pretext (@chenglou/pretext) is a hybrid approach worth knowing: it uses Canvas's font measurement engine as ground truth, then delivers pure arithmetic for layout. Result: 300–500x faster text measurement than DOM. Relevant when building data-dense UIs where text layout is a bottleneck.

Interactive Demo: DOM vs Canvas Rendering (500 Blocks)

Both panels render 500 colored blocks and animate them. Watch the FPS difference when both run simultaneously.

DOM Rendering

-- fps

Canvas Rendering

-- fps

Click "Start Both" to begin the stress test...

3

Architecture Patterns

The Big Four Patterns for Snappy Data-Rich Apps

These four patterns solve the most common performance problems in data-heavy applications. Click each to expand.

1

Virtualization — Only Render What's Visible

▼

Problem: You have 50,000 rows in a table. Rendering all 50,000 DOM nodes freezes the browser for seconds and eats hundreds of MB of RAM.

✅

Rule of Thumb: If your list or table can exceed ~100 rows, virtualize it. The user can only see ~20-40 rows at a time anyway.

How it works

Virtual scrolling maintains a "render window" — only the rows currently visible in the viewport (plus a small overscan buffer) exist in the DOM. As the user scrolls, rows are recycled: old ones are removed, new ones are created. The scrollbar is faked with a spacer element at the correct total height.

// TanStack Virtual — minimal example (50,000 rows, ~30 in DOM at any time)
import { useVirtualizer } from '@tanstack/react-virtual';

function VirtualList({ items }) {
  const parentRef = useRef(null);

  const virtualizer = useVirtualizer({
    count: items.length,       // 50,000
    getScrollElement: () => parentRef.current,
    estimateSize: () => 40,    // estimated row height in px
    overscan: 5,               // render 5 extra rows above/below
  });

  return (
    <div ref={parentRef} style={{ height: '600px', overflow: 'auto' }}>
      <div style={{ height: virtualizer.getTotalSize() }}>
        {virtualizer.getVirtualItems().map(vRow => (
          <div key={vRow.key}
            style={{
              position: 'absolute',
              top: 0,
              transform: \`translateY(\${vRow.start}px)\`,
              height: vRow.size,
            }}>
            {items[vRow.index].name}
          </div>
        ))}
      </div>
    </div>
  );
}

Libraries

TanStack Virtual — framework-agnostic, supports variable heights, grids
react-window — simpler API, fixed or variable sizes
AG Grid — full enterprise grid with built-in virtualization

⚠

The Variable-Height Problem: Virtual scrolling needs to know each row's height before it's rendered (for scroll position math). Variable-height rows require measurement or estimation. TanStack Virtual handles this with measureElement, but it adds complexity.

2

Server-Side Data Operations

▼

Problem: You shipped 50,000 rows to the browser and now you're sorting and filtering in JavaScript. Each operation takes 200ms+ and blocks the UI.

✅

Rule of Thumb: Never sort, filter, or search more than ~1,000 rows in client-side JavaScript. SQL is orders of magnitude faster, and it doesn't block the UI thread.

Search-as-you-type Architecture

The pattern: debounce user input → send query to server → server does the heavy lifting (SQL, full-text search) → return minimal JSON → render results.

// Debounced search — the right pattern for large datasets
function SearchInput({ onResults }) {
  const [query, setQuery] = useState('');
  const controller = useRef(null);

  useEffect(() => {
    if (!query || query.length < 2) return;

    // Cancel previous in-flight request
    controller.current?.abort();
    controller.current = new AbortController();

    const timer = setTimeout(async () => {
      try {
        const res = await fetch(
          \`/api/search?q=\${encodeURIComponent(query)}&limit=20\`,
          { signal: controller.current.signal }
        );
        const data = await res.json();
        onResults(data.results);
      } catch (e) {
        if (e.name !== 'AbortError') throw e;
      }
    }, 150); // 150ms debounce — sweet spot

    return () => clearTimeout(timer);
  }, [query]);

  return <input value={query} onChange={e => setQuery(e.target.value)} />;
}

DuckDB — The Secret Weapon

DuckDB is an embeddable analytical database that runs sub-100ms queries on millions of rows. It can run in-process (no separate server), supports SQL, and even compiles to WASM for the browser. For dashboards and analytical tools, it's transformative.

Pagination — simpler, better for SEO, more predictable memory usage
Infinite scroll — better UX for feeds and exploration, but harder to implement correctly (memory management, scroll position restoration)

3

Web Workers — Don't Block the Main Thread

▼

Problem: You're parsing a 5MB FHIR bundle or running a complex data transform. The entire UI freezes for 800ms while the main thread grinds through it.

💡

What is a Web Worker? A separate JavaScript thread that runs in parallel to your main thread. Workers can't touch the DOM, but they can do heavy computation without blocking user interaction. Communication happens via postMessage.

What to offload to workers

FHIR bundle parsing (1,180 patients? That's worker territory)
Heavy data transforms, aggregations, grouping
Search indexing (build a Fuse.js index off-thread)
CSV/Excel file parsing
Image processing or compression

// worker.js — runs on a separate thread
self.onmessage = function(e) {
  const { type, payload } = e.data;

  if (type === 'PARSE_FHIR_BUNDLE') {
    // Heavy work happens here — main thread stays responsive
    const patients = payload.entry
      .filter(e => e.resource.resourceType === 'Patient')
      .map(e => ({
        id: e.resource.id,
        name: e.resource.name?.[0]?.text || 'Unknown',
        birthDate: e.resource.birthDate,
        conditions: extractConditions(e.resource),
      }));

    self.postMessage({ type: 'PARSED', patients });
  }
};

// main.js — stays snappy
const worker = new Worker('worker.js');

worker.postMessage({
  type: 'PARSE_FHIR_BUNDLE',
  payload: hugeFhirBundle  // 5MB of FHIR data
});

worker.onmessage = (e) => {
  if (e.data.type === 'PARSED') {
    renderPatientList(e.data.patients); // UI updates instantly
  }
};

⚠

Transferable objects: Large ArrayBuffers can be transferred (zero-copy) to/from workers instead of cloned. Use postMessage(data, [buffer]) for large binary data to avoid the serialization cost.

4

Maps — Canvas/WebGL or Die

▼

Problem: You're rendering 2,000 provider locations as <div> markers in the DOM. Pan and zoom are at 8fps. The page uses 400MB of RAM.

✅

Rule of Thumb: DOM map markers stop scaling at ~500 pins. Past that, you need WebGL rendering and clustering.

Clustering with Supercluster

Never render individual pins at wide zoom levels. Supercluster groups nearby points into clusters that expand as the user zooms in. This keeps the rendered element count constant regardless of dataset size.

// Mapbox GL + Supercluster — minimal clustering example
import mapboxgl from 'mapbox-gl';

const map = new mapboxgl.Map({
  container: 'map',
  style: 'mapbox://styles/mapbox/light-v11',
  center: [-98.5, 39.8],
  zoom: 4,
});

map.on('load', () => {
  map.addSource('providers', {
    type: 'geojson',
    data: providersGeoJSON,  // 10,000+ points
    cluster: true,
    clusterMaxZoom: 14,
    clusterRadius: 50,
  });

  // Cluster circles
  map.addLayer({
    id: 'clusters',
    type: 'circle',
    source: 'providers',
    filter: ['has', 'point_count'],
    paint: {
      'circle-color': ['step', ['get', 'point_count'],
        '#51bbd6', 100, '#f1f075', 750, '#f28cb1'],
      'circle-radius': ['step', ['get', 'point_count'],
        20, 100, 30, 750, 40],
    },
  });

  // Individual points (only visible at high zoom)
  map.addLayer({
    id: 'points',
    type: 'circle',
    source: 'providers',
    filter: ['!', ['has', 'point_count']],
    paint: { 'circle-radius': 6, 'circle-color': '#3b82f6' },
  });
});

The Stack

Mapbox GL JS / MapLibre GL — WebGL-based maps, handles millions of features
Deck.gl — data visualization layers on top of maps: heatmaps, hexbins, scatter plots, arc layers — all GPU-accelerated, millions of points at 60fps
Supercluster — fast geospatial point clustering, works with any map library

4

Measuring Performance

Performance Metrics — What "Snappy" Actually Means

You can't optimize what you can't measure. Here's the framework for thinking about — and talking about — web performance.

The Three Feelings

Users don't think in milliseconds. They have three distinct performance-related feelings:

⚡

"It responded"

<100ms

Input latency target

🚀

"It loaded"

<3s

Time to Interactive

🎨

"It's smooth"

60fps

Frame rate target

The 100ms Wall

This is the single most important number in web performance:

<100ms

Feels instant

100–300ms

Feels sluggish

300ms–1s

Feels slow

>1s

Feels broken

The RAIL Model (Google's Framework)

RAIL gives you UX-driven performance budgets for four categories of work:

Category	Budget	What it covers
Response	<100ms	React to user input (click, tap, type). If you can't finish in 100ms, show a loading state.
Animation	<16ms/frame	Visual transitions, scrolling, dragging. Each frame gets 16ms (aim for 10ms to leave room).
Idle	50ms chunks	Use idle time for deferred work (analytics, prefetching). Keep each chunk under 50ms so you can respond to input instantly.
Load	<3s TTI	Page should be interactive within 3s on mid-range mobile with 3G. For data apps: show skeleton/content progressively.

Browser DevTools — Your Performance Lab

Three tabs you should know intimately:

Performance Tab

Record a user interaction. The flame chart shows exactly where time is spent on the main thread. Yellow = JS, purple = layout, green = paint. Look for long tasks (>50ms) in the "Main" section. The frame timeline at the top shows dropped frames as red bars.

Network Tab

The waterfall view reveals request chains — requests that can't start until others finish. Look for: oversized payloads (are you shipping 5MB of JSON?), render-blocking resources, and slow TTFB (server response time). Filter by "XHR" to see just your API calls.

Lighthouse

Automated audit that scores Performance, Accessibility, Best Practices, and SEO. Fix in this order: Largest Contentful Paint (LCP) → Cumulative Layout Shift (CLS) → Interaction to Next Paint (INP). These are the Core Web Vitals that actually matter.

5

Diagnostic Tool

The Performance Screener

Use this tool to evaluate an existing app or plan a new one. Two modes: diagnose problems or plan architecture.

Toggle the issues that apply to your app. Your risk score and recommendations update in real time.

I have tables with more than 500 rows

I'm rendering map pins as DOM elements

I filter or sort data in JavaScript on the client

I parse large files synchronously on the main thread

My search hits the server on every keystroke

I'm using a DOM-based chart library with >1,000 data points

I'm not virtualizing long lists

Risk Score: 0 / 100

Answer these questions about your planned app to get architecture recommendations.

How many rows/items will your largest dataset have?

Does the app need maps?

Will users search or filter the data?

Any heavy data parsing? (large files, FHIR bundles, etc.)

Does the data update in real-time?

Recommended Architecture

6

Side-by-Side

Architecture Patterns: Before & After

Concrete examples of common mistakes and their fixes, with estimated performance impact.

Table Rendering — 10,000 Rows

❌ The Slow Way

// Render ALL 10,000 rows as real DOM nodes
// Sort by re-sorting the JS array and re-rendering everything
function renderTable(data) {
  const tbody = document.querySelector('tbody');
  tbody.innerHTML = data.map(row =>
    `${row.name}
     ${row.date}`
  ).join('');
}

// On sort click:
data.sort((a, b) => a.name.localeCompare(b.name));
renderTable(data); // 10,000 DOM nodes created

Impact: ~800ms to render, ~200ms to sort, 10,000 DOM nodes eating 80MB+ RAM. Scrolling jank on mobile.

✅ The Fast Way

// Server-side sort + virtual scroll
// Only ~30 DOM rows exist at any time
const query = await fetch(
  `/api/data?sort=name&page=1&limit=30`
);

// TanStack Virtual handles the render window
const virtualizer = useVirtualizer({
  count: serverTotalCount,  // 10,000
  getScrollElement: () => parentRef.current,
  estimateSize: () => 44,
  overscan: 5,
});

Impact: ~5ms to render 30 rows, sort is instant (SQL), 30 DOM nodes, butter-smooth scrolling.

Search — 50,000 Item Dataset

❌ The Slow Way

// Filter 50K items on EVERY keypress
input.addEventListener('input', (e) => {
  const q = e.target.value.toLowerCase();
  const filtered = allItems.filter(item =>
    item.name.toLowerCase().includes(q) ||
    item.description.toLowerCase().includes(q)
  );
  renderResults(filtered); // could be 40K results
});

Impact: ~150ms per keystroke to filter + re-render. UI freezes while typing. Rendering thousands of matching results compounds the problem.

✅ The Fast Way

// Debounce 150ms → server full-text search
// Return only 20 results
let timer;
input.addEventListener('input', (e) => {
  clearTimeout(timer);
  timer = setTimeout(async () => {
    const res = await fetch(
      `/api/search?q=${encodeURIComponent(
        e.target.value)}&limit=20`
    );
    renderResults(await res.json());
  }, 150);
});

Impact: Zero client-side computation. DuckDB/Postgres FTS returns 20 results in <10ms. Only 20 DOM nodes to render. Typing is never blocked.

Map Markers — 2,000 Locations

❌ The Slow Way

// 2,000 DOM marker divs
locations.forEach(loc => {
  const marker = document.createElement('div');
  marker.className = 'map-pin';
  marker.style.left = project(loc.lng) + 'px';
  marker.style.top = project(loc.lat) + 'px';
  mapContainer.appendChild(marker);
});
// On pan/zoom: update all 2,000 positions

Impact: 2,000 DOM nodes, each repositioned on every frame during pan. ~8fps on mobile. 400MB memory.

✅ The Fast Way

// WebGL rendering + clustering
map.addSource('locations', {
  type: 'geojson',
  data: locationsGeoJSON,
  cluster: true,
  clusterMaxZoom: 14,
  clusterRadius: 50,
});
// Mapbox renders everything on GPU
// ~20 cluster circles at zoom-out
// Individual pins only at high zoom

Impact: Zero DOM markers. GPU renders all points. 60fps pan/zoom. Works with 100,000+ points. ~50MB memory.

7

Quick Reference

The Stack Cheat Sheet

Your quick-reference card for building fast data-rich apps.

Need	Reach For	Why
Large data tables	TanStack Table + TanStack Virtual	Best-in-class, framework-agnostic, virtualization built in
Search-as-you-type	Debounce + DuckDB / Postgres FTS	Server-side is always faster for large datasets
Maps with many markers	Mapbox GL / MapLibre GL	WebGL rendering, scales to millions of points
Data viz layers on maps	Deck.gl	Built for this — heatmaps, scatter, hexbin, arc layers, all GPU-accelerated
Heavy parsing / computation	Web Workers	Keep the main thread free for UI responsiveness
Text measurement at scale	Pretext (@chenglou/pretext)	300–500x faster than DOM measurement via Canvas font metrics
Complex data viz	Canvas-based: uPlot, ECharts (canvas)	DOM charting breaks down above ~1K data points
Real-time data	WebSockets + optimistic UI	Don't poll — push. Update UI before server confirms for perceived speed

💡

The meta-rule: Performance is an architecture decision, not an optimization step. If you choose the wrong rendering model or data strategy upfront, no amount of memoization or lazy loading will save you. Make the big calls right — rendering model, data flow, computation placement — and the details take care of themselves.

Building Snappy

How Browsers Actually Work

The Main Thread Pipeline

The Frame Budget

Interactive Demo: Layout Thrashing

DOM vs Canvas vs WebGL

When to use each

Interactive Demo: DOM vs Canvas Rendering (500 Blocks)

The Big Four Patterns for Snappy Data-Rich Apps

How it works

Libraries

Search-as-you-type Architecture

DuckDB — The Secret Weapon

What to offload to workers

Clustering with Supercluster

The Stack

Performance Metrics — What "Snappy" Actually Means

The Three Feelings

The 100ms Wall

The RAIL Model (Google's Framework)

Browser DevTools — Your Performance Lab

The Performance Screener

Performance Blueprint

Recommended Architecture

Architecture Patterns: Before & After

Table Rendering — 10,000 Rows

❌ The Slow Way

✅ The Fast Way

Search — 50,000 Item Dataset

❌ The Slow Way

✅ The Fast Way

Map Markers — 2,000 Locations

❌ The Slow Way

✅ The Fast Way

The Stack Cheat Sheet