CollectorVision Part 5: The Browser Scanner — Full ML Pipeline in a Web Page

Part 5 of the CollectorVision series. Part 1 has the overview.

The browser scanner runs the entire pipeline — corner detection, perspective warp, neural embedding, catalog search — inside a web page, with no server. It's hosted at https://hanclinto.github.io/CollectorVision/. Open that on your phone, point it at a Magic card, and it'll tell you what it is.

This was not straightforward to get working right.

Stack

The core piece is ONNX Runtime Web (onnxruntime-web), which runs .onnx models in the browser via WebAssembly or WebGPU. The same model files that ship in the Python package run in the browser. The preprocessing — resizing, ImageNet normalization, the SimCC softmax — is reimplemented in JavaScript to match the Python output.

The perspective warp uses OpenCV.js, a WebAssembly build of OpenCV2. The catalog search is a plain JavaScript dot-product loop on a Float32Array.

There's no framework. No React, no bundler. Plain ES modules, a couple of workers, and static files that can be hosted anywhere.

Workers

Model inference takes 30–100ms per frame depending on the device. That's too slow to run on the main thread if you want a responsive UI. The scanner uses two Web Workers:

scanner.worker.mjs handles Cornelius inference, dewarp, and Milo embedding
enricher.worker.mjs fetches card names and prices from Scryfall after confirmation

The main thread handles the camera feed, draws the corner overlay, manages the scan list, and handles all user input. The workers communicate via postMessage.

Caching

The models and catalog together are about 32 MB. Re-downloading on each visit would make the scanner unusable on mobile. Everything is cached in IndexedDB after the first load.

A manifest.json file contains version strings for each asset. When a model or the catalog is updated, the version bumps and the old cached data is replaced on next open. After first load, the scanner opens in a second or two even on a slow connection — and works fully offline.

The WebGPU problem on Android ARM

WebGPU is theoretically faster than WebAssembly for matrix operations. The first version of the scanner used it. It produced wrong answers on Android ARM devices.

The specific bug was a precision issue in the WebGPU execution provider for ONNX Runtime Web on Android ARM. The embeddings came out subtly wrong — not zeros, not NaN, just numerically incorrect. Cosine similarities dropped enough that identification failed silently. Took a while to find because the behavior was device-specific.

The fix is to always use WASM for inference, even on devices that support WebGPU. WASM is deterministic across platforms and matches the Python/CPU results.

const session = await ort.InferenceSession.create(modelPath, {
  executionProviders: [&quot;wasm&quot;],
});

WebGPU is now an opt-in setting for people who want to test it, but WASM is the default. The cross-pipeline consistency test in tests/test_pipeline_consistency.py was written specifically to catch regressions here — it captures embeddings from both the Python pipeline and the JS-WASM pipeline on the same frame and checks that they agree within a cosine similarity bound.

Scan confirmation: the ScanBucket

The scanner doesn't confirm a card on a single frame. It uses a sliding-window counter that requires the same card to win across several consecutive frames:

push(cardId) {
  if (cardId === this.candidate) {
    this.count++;
    if (this.count &gt;= this.threshold) {
      const result = this.candidate;
      this.reset();
      return result;
    }
  } else {
    this.candidate = cardId;
    this.count = 1;
    if (this.threshold === 1) return cardId;
  }
  return null;
}

The threshold is configurable from settings (1–8, default 4). Lower threshold scans faster; higher threshold is more conservative. Setting it to 1 is useful when you're working through a sorted pile and picking up each card briefly.

After confirmation

When a card is confirmed, a request goes to the enricher worker, which fetches name, set, and market price from the Scryfall API. The main thread adds the card to the scan list immediately with whatever static data it already has, then updates the row when Scryfall responds. This keeps the recognition loop responsive — Scryfall latency doesn't affect scan speed.

The scan list persists to localStorage, so it survives page reloads. The list can be exported as text or CSV.

After Scryfall returns the price, a sound plays: a higher tone for cards above $5, a lower tone for cards above $0.25, nothing for bulk. It's a small thing, but it makes scanning through a box of cards more engaging.

Cross-pipeline consistency tests

The hardest part of the browser port was making sure the JavaScript pipeline produces the same embeddings as the Python one. A small difference in preprocessing (wrong normalization constants, off-by-one in a resize, BGR/RGB channel swap) would produce systematically wrong results that are hard to diagnose from the UI alone.

The solution was to build a test fixture: capture a frame, run it through both pipelines, save both embeddings as JSON, and assert that their cosine similarity is above 0.80. Those fixture files live in tests/fixtures/captures/ and the test runs in CI on every push.

Next: Part 6 — The hybrid split