CollectorVision Part 4: Running It Locally — Python Library and REST Server

Part 4 of the CollectorVision series. Part 1 has the overview.

This post covers the practical side: installing the library, running it against a webcam, and the REST server. Post 6 goes deeper into the deployment split — when to run everything locally versus offloading catalog lookup to a server.

Installing

pip install git+https://github.com/HanClinto/CollectorVision.git

Or with uv:

uv pip install git+https://github.com/HanClinto/CollectorVision.git

Requires Python 3.10+. The only ML dependency is onnxruntime. Model weights are bundled with the package. No PyPI release yet — that's coming.

Quickstart

import cv2
import collector_vision as cvg

catalog = cvg.Catalog.load(&quot;hf://HanClinto/milo/scryfall-mtg&quot;)
image   = cv2.imread(&quot;my_card.jpg&quot;)

detection = cvg.NeuralCornerDetector().detect(image)
crop      = detection.dewarp(image)
emb       = catalog.embedder.embed(crop)
score, card_id = catalog.search(emb)[0]
print(card_id, score)

The catalog loads from HuggingFace on first run (~29 MB) and is cached after that.

Live video

For a webcam feed, call detect() on each frame and use the sharpness gate to skip unconfident frames. Accumulate scores across frames before committing to an answer.

cap = cv2.VideoCapture(0)
detector = cvg.NeuralCornerDetector()
score_map = {}

while True:
    ret, frame = cap.read()
    if not ret:
        break

    detection = detector.detect(frame)

    if detection.card_present and detection.sharpness &gt; 0.10:
        crop = detection.dewarp(frame)
        emb  = catalog.embedder.embed(crop)
        for score, card_id in catalog.search(emb, top_k=5):
            score_map[card_id] = score_map.get(card_id, 0.0) + score

    if score_map:
        best_id = max(score_map, key=score_map.get)
        if score_map[best_id] &gt; 3.5:
            print(&quot;Confirmed:&quot;, best_id)
            score_map.clear()

The threshold of 3.5 is roughly equivalent to the same card winning across four consecutive frames. You can tune this.

REST server

For cases where you want to call the pipeline from a phone app, a web client, or a script over HTTP, there's a FastAPI server in examples/server/.

Install server dependencies:

pip install &quot;git+https://github.com/HanClinto/CollectorVision.git[server]&quot;

Start it:

python examples/server/server.py --hfd HanClinto/milo scryfall-mtg

Identify a card by uploading an image:

curl -X POST http://localhost:8000/identify/upload \
     -F &quot;file=@my_card.jpg&quot;

The response looks like:

{
  &quot;card_present&quot;: true,
  &quot;card_id&quot;: &quot;7286819f-6c57-4503-898c-528786ad86e9&quot;,
  &quot;confidence&quot;: 0.934,
  &quot;embedding&quot;: [0.023, -0.041, ...]
}

The embedding field is included in every response. That's used by the rolling buffer feature — clients can send back their recent frame embeddings and the server will average them before searching, which helps with noise. The server is stateless; the client owns the history.

Pre-cropped images

If you already have a clean card crop — from a flatbed scanner, for instance — you can skip detection entirely:

from PIL import Image

crop = Image.open(&quot;clean_crop.jpg&quot;)
emb  = catalog.embedder.embed(crop)
hits = catalog.search(emb)

Swapping components

The library is built around protocols, not subclassing. Anything that implements detect(image) -> DetectionResult works as a corner detector. The repo has an example using OpenCV Canny edges in examples/advanced/custom_pipeline.py.

Next: Part 5 — The browser scanner