CollectorVision Part 6: The Hybrid Split — Local Inference, Server Lookup

Part 6 of the CollectorVision series. Part 1 has the overview.

There are three ways to deploy CollectorVision, depending on where you want the computation to happen and how much bandwidth you have.

Fully local

The Python library and the browser scanner both run entirely on the client. The catalog lives on the device (or in the browser's IndexedDB cache). No network traffic beyond the initial catalog download.

This is the right choice for desktop tools, offline use, or when you don't want to run a server. The catalog is ~29 MB as float32, or ~14 MB as float16 in the browser. That's a one-time download.

Fully server-side

Send the raw image to the server; get back the result. The server runs Cornelius, the dewarp, Milo, and the search.

curl -X POST http://server/identify/upload -F &quot;file=@card.jpg&quot;

This is simpler for the client but expensive in bandwidth. A JPEG-compressed phone photo is 100–500 KB. At one frame per second, that's up to 30 MB per minute.

Hybrid: local inference, server lookup

Run Cornelius and Milo on the device. Send only the embedding to the server for catalog lookup.

The embedding is 128 float32 values — 512 bytes. That's about 1000x smaller than the image.

embedding = run_local_pipeline(frame)   # 512 bytes

response = requests.post(&quot;http://server/identify&quot;, json={
    &quot;embedding&quot;: embedding.tolist(),
}).json()

The server's /identify endpoint accepts either an image or a pre-computed embedding. If you send an embedding, it skips inference and goes straight to catalog search.

This approach makes sense for a mobile app where you don't want to ship a 30 MB catalog in the app bundle, but you also don't want to upload full images on every frame. The client needs the model weights (~2 MB total); the server owns the catalog and handles updates. When new sets release, you update the catalog on the server and all clients benefit immediately with no app update.

Multi-frame aggregation in the hybrid case

For live-camera use, you want to accumulate results across frames before committing to an answer. In the fully-local case, you do this client-side. In the hybrid case, you have two options: accumulate client-side and only send to the server when you have a candidate, or send each frame to the server with a history of recent embeddings and let the server average them.

The REST server supports the second approach. The client keeps a short deque of recent embeddings and sends them with each request:

from collections import deque

buffer = deque(maxlen=5)

while scanning:
    emb  = compute_embedding(frame)
    resp = requests.post(&quot;/identify&quot;, json={
        &quot;embedding&quot;: emb.tolist(),
        &quot;prior_embeddings&quot;: [e.tolist() for e in buffer],
    }).json()

    buffer.append(emb)

    if resp[&quot;confidence&quot;] &gt; 0.85:
        print(resp[&quot;card_id&quot;])
        buffer.clear()

The server averages the current embedding with the prior buffer before searching. It doesn't store anything between requests — the client owns the state. This keeps the server stateless and easy to scale.

Bandwidth comparison

Approach	Per-frame payload
Image upload	100–500 KB
Embedding upload	512 bytes

For a 1 fps scanning session, image upload costs ~20 MB/minute. Embedding upload costs ~30 KB/minute.

Which to use

For someone just wanting to identify a card from a photo, the full server-side API is fine. For a scanning session at a table going through a box of cards, the bandwidth cost of image upload adds up. For a production mobile app, the hybrid approach gives you the lowest bandwidth, the most privacy (images never leave the device), and transparent catalog updates.

The browser scanner uses fully local mode since it's hosted on GitHub Pages with no backend.

This is the last post in the series. The code is at https://github.com/HanClinto/CollectorVision. The browser demo is at https://hanclinto.github.io/CollectorVision/.