🔧 Tools 📰 Blog 🥽 XR Hub 🧪 Experiments
← Back to Blog

Build an AR Object Detector in One HTML File — Cyberpunk UI with TensorFlow.js

Real-time AI object detection — running entirely in your browser, with zero backend, zero install, and a full cyberpunk HUD — in a single HTML file. This tutorial walks you through every part of building AR Vision: a live camera feed with TensorFlow.js COCO-SSD, animated neon bounding boxes, per-class colour coding, detection history, a live FPS counter, and a glassmorphism UI. Works on desktop webcam and mobile rear camera.
AR VISION
📷 Live Camera 🤖 TensorFlow.js ⚡ 80 Object Classes 🎨 Cyberpunk UI
LAUNCH LIVE DEMO

Opens in a new tab · Allow camera access when prompted · Works best on mobile pointed at real objects

🔧 Tech Stack — What Powers AR Vision

Every library and API in this app is free, open-source, and runs entirely in the browser — no backend, no API key, no install. Here's the full breakdown:

Core Detection
TensorFlow.js
Google's ML library running neural networks in-browser via WebGL. Uses your GPU — no server, no Python.
COCO-SSD — lite_mobilenet_v2
Pre-trained on 80 everyday object classes. "Lite MobileNet" = the fastest variant, tuned for real-time detection on phones. ~5MB model weights fetched at first run, then browser-cached.
WebGL backend
TensorFlow.js routes inference through your device's GPU via WebGL — making live 30fps detection possible on mid-range phones.
Browser APIs
getUserMedia
Standard web API for camera access. Handles front/rear selection, resolution constraints, and permission denial gracefully.
Canvas 2D
Draws the neon bounding boxes, corner bracket accents, shadowBlur glow effects, and dashed "Scanning..." outlines over the live feed.
requestAnimationFrame
Drives the smooth render loop — synced to the screen's refresh rate (60Hz display, ~20–30fps inference).
UI / Styling
Plain HTML + CSS + JS
No frameworks. Zero dependencies beyond the TF.js CDN. The entire UI is hand-written in a single file — making it portable and easy to modify.
CSS backdrop-filter
Powers the frosted glass "glassmorphism" HUD panels and floating label cards.
Google Fonts
Orbitron — techy display title. Chakra Petch — UI text. Share Tech Mono — monospace readouts.
Delivery
Standalone HTML (~750KB)
TF.js libraries and Google Fonts are inlined — the UI runs fully offline after the first load. Only the COCO-SSD model weights (~5MB) need internet the first time, then they're browser-cached.
Built with Claude Design
The entire app was generated from a single detailed prompt — no manual coding session. See the full prompt in Step 11 below.

🗺️ System Architecture Diagram

This interactive diagram shows how all the components of AR Vision connect — from the camera input through TensorFlow.js inference to the rendered output. Open it to explore the full data flow:

📷 CAMERA getUserMedia · facingMode 🤖 TENSORFLOW.JS + COCO-SSD detect(video) → predictions[] · WebGL GPU 🎨 CANVAS 2D RENDERER Neon boxes · Labels · HUD · History ✅ AR DISPLAY OUTPUT rAF loop 🗺️ OPEN INTERACTIVE ARCHITECTURE DIAGRAM

Opens in a new tab · Interactive · Shows full data flow from camera → TF.js → Canvas

📱 Will This Work on Mobile?

Yes — it's built mobile-first. Here's what's specifically designed for phones, and the one important catch you need to know before sharing the file:

✅ What's designed for mobile
📷 Rear camera by default — uses facingMode: { ideal: "environment" } on mobile so it immediately points at the world
🔄 Front/rear switch button — visible only on phones, toggles between cameras mid-session
📐 Touch-sized controls — HUD panels and sidebar scale down on small screens, viewport-fit=cover handles notches
playsinline — prevents iOS from forcing the video fullscreen when it starts
lite_mobilenet_v2 — the lightweight model variant runs smoothly on mid-range Android and iPhone (FPS counter turns amber if it drops below 15)
⚠️ The catch — camera needs HTTPS

Mobile browsers only grant camera access on https:// or localhost. If you AirDrop or email the HTML file and tap it directly on your phone, the browser opens it as file:// — and you'll hit the styled "Camera Access Denied" screen.

To use on mobile — host it over HTTPS. Easiest free options:

🚀 Netlify Drop — drag the HTML file onto the page, get an instant HTTPS link in seconds
📄 GitHub Pages — push the file to a repo, enable Pages, free HTTPS subdomain
Vercel — drag and drop deploy, free HTTPS URL instantly
🌐 Cloudflare Pages — free static hosting with HTTPS and global CDN
💡 iOS note: On iPhone, open in Safari — not Chrome or Firefox. Apple restricts camera API access to Safari on iOS regardless of which browser you use. The app works identically on Safari iPhone once hosted over HTTPS.

What You'll Build

AR Vision is a complete browser-based augmented reality object detector. Point your phone camera at any object and it identifies it in real time — drawing colour-coded neon bounding boxes, showing confidence scores, and logging detection history. It uses COCO-SSD, a pre-trained model that recognises 80 everyday object classes (people, chairs, phones, cups, dogs, cars, and more).

📷
Live Camera Feed
Full-screen rear camera on mobile, front webcam on desktop — switchable mid-session
🎯
Real-Time Detection
requestAnimationFrame loop running COCO-SSD inference 20–30 times per second
🌈
Colour-Coded Boxes
People = amber · Food = green · Electronics = purple · Vehicles = red · Furniture = cyan
🃏
Floating Label Cards
Glassmorphism cards with object name, emoji icon, confidence bar, and a fun fact per class
📊
HUD Panels
Top bar: app title + live FPS. Bottom bar: object count + top detection + model status
🕐
Detection History
Sidebar showing last 5 unique objects detected with timestamps

How It Works — TensorFlow.js + COCO-SSD Explained

TensorFlow.js is Google's machine learning library ported to run entirely in the browser via WebGL and WebAssembly. No Python, no server, no API key — the model runs on your device's GPU (or CPU as fallback) directly inside a <script> tag.

COCO-SSD (Common Objects in Context — Single Shot MultiBox Detector) is a pre-trained object detection model. "Single Shot" means it detects all objects in one forward pass through the network — making it fast enough for real-time use. It outputs an array of detections, each with a class, score (0–1 confidence), and bbox (bounding box coordinates).

💡 The loop

Video element → Canvas capture → COCO-SSD.detect() → Array of detections → Draw boxes + labels on canvas overlay → requestAnimationFrame → repeat. The whole cycle runs every ~33ms at 30fps.

Step 1 — HTML Skeleton & CDN Imports

The entire app is a single HTML file. Start with the base structure and load TensorFlow.js plus the COCO-SSD model from CDN:

HTML
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>AR Vision — Object Detector</title>

  <!-- TensorFlow.js core -->
  <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@4.22.0/dist/tf.min.js"></script>

  <!-- COCO-SSD object detection model -->
  <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/coco-ssd@2.2.3/dist/coco-ssd.min.js"></script>

  <!-- Google Font: Orbitron for the cyberpunk feel -->
  <link href="https://fonts.googleapis.com/css2?family=Orbitron:wght@700;900&display=swap" rel="stylesheet">
</head>
<body>
  <!-- All content goes here -->
</body>
</html>
💡 Why jsDelivr?

jsDelivr is a free, fast CDN that serves npm packages — no account needed. The version pins (@4.22.0) prevent breaking changes from auto-updating under you.

Step 2 — Camera Access with getUserMedia

Add a <video> element (hidden) and a <canvas> overlay (visible). The video streams camera feed; the canvas draws detection results on top.

HTML — body structure
<!-- Hidden video element for camera stream -->
<video id="video" autoplay muted playsinline
  style="position:fixed;inset:0;width:100%;height:100%;object-fit:cover;z-index:0"></video>

<!-- Canvas overlay for bounding boxes -->
<canvas id="canvas"
  style="position:fixed;inset:0;width:100%;height:100%;z-index:1;pointer-events:none"></canvas>
JavaScript — camera setup
const video = document.getElementById('video');
const canvas = document.getElementById('canvas');
const ctx = canvas.getContext('2d');

// Detect mobile for rear camera default
const isMobile = /Mobi|Android/i.test(navigator.userAgent);
let facingMode = isMobile ? { ideal: 'environment' } : 'user';

async function startCamera() {
  try {
    const stream = await navigator.mediaDevices.getUserMedia({
      video: { facingMode, width: { ideal: 1280 }, height: { ideal: 720 } },
      audio: false
    });
    video.srcObject = stream;
    await video.play();

    // Match canvas size to video
    canvas.width  = video.videoWidth;
    canvas.height = video.videoHeight;
  } catch (err) {
    showError(err); // Styled error state
  }
}
⚠️ HTTPS required

Camera access via getUserMedia only works on HTTPS (or localhost). If you open the HTML file directly from your file system (file://), camera access will be blocked. Serve it with a local server or deploy to any host with SSL.

Step 3 — Load COCO-SSD and Run the Detection Loop

Load the model once, then run it every animation frame. The model's detect() method accepts the video element directly and returns an array of prediction objects.

JavaScript — model load + detection loop
let model, isRunning = false;
let lastTime = performance.now(), fps = 0;

async function loadModel() {
  setStatus('Loading model...');
  model = await cocoSsd.load({
    base: 'lite_mobilenet_v2'   // Fastest variant for real-time
  });
  setStatus('Model ready ✓');
}

async function detect() {
  if (!isRunning || !model) return;

  // Calculate FPS
  const now = performance.now();
  fps = Math.round(1000 / (now - lastTime));
  lastTime = now;
  updateFPS(fps);

  // Sync canvas to video dimensions
  canvas.width  = video.videoWidth;
  canvas.height = video.videoHeight;

  // Run inference
  const predictions = await model.detect(video, {
    maxNumBoxes: 10,          // Max objects per frame
    minScore: 0.4            // Filter low-confidence noise
  });

  ctx.clearRect(0, 0, canvas.width, canvas.height);
  predictions.forEach(pred => drawBox(pred));
  updateHUD(predictions);

  requestAnimationFrame(detect);
}

Step 4 — Draw Neon Bounding Boxes on Canvas

Each prediction has a bbox array: [x, y, width, height] in pixels. Draw a glowing rectangle using shadowBlur for the neon effect:

JavaScript — draw neon box
function drawBox(pred) {
  const [x, y, w, h] = pred.bbox;
  const color = getColor(pred.class);
  const isLowConf = pred.score < 0.7;

  // Neon glow effect
  ctx.shadowBlur   = 20;
  ctx.shadowColor  = color;
  ctx.strokeStyle  = color;
  ctx.lineWidth    = isLowConf ? 1.5 : 2.5;

  // Dashed border for low confidence ("Scanning...")
  if (isLowConf) ctx.setLineDash([6, 4]);
  else ctx.setLineDash([]);

  ctx.strokeRect(x, y, w, h);

  // Corner bracket accents (cyberpunk style)
  const s = 18;
  ctx.lineWidth = 3;
  [[x,y],[x+w,y],[x,y+h],[x+w,y+h]].forEach(([cx,cy], i) => {
    ctx.beginPath();
    ctx.moveTo(cx + (i%2===0 ? s : -s), cy);
    ctx.lineTo(cx, cy);
    ctx.lineTo(cx, cy + (i<2 ? s : -s));
    ctx.stroke();
  });

  ctx.shadowBlur = 0;
}

Step 5 — Colour-Code by Object Class Category

Map each COCO class name to a category, then assign a neon colour per category:

People — amber
Food — green
Furniture — cyan
Electronics — purple
Vehicles — red
Animals / Other — teal
JavaScript — colour mapping
const CLASS_COLORS = {
  // People
  person: '#fbbf24',
  // Food
  banana: '#4ade80', apple: '#4ade80', orange: '#4ade80',
  sandwich: '#4ade80', pizza: '#4ade80', cake: '#4ade80',
  // Furniture
  chair: '#22d3ee', couch: '#22d3ee', bed: '#22d3ee',
  // Electronics
  laptop: '#a78bfa', cell phone: '#a78bfa', tv: '#a78bfa',
  keyboard: '#a78bfa', mouse: '#a78bfa',
  // Vehicles
  car: '#f87171', bus: '#f87171', truck: '#f87171',
  bicycle: '#f87171', motorcycle: '#f87171',
};

function getColor(cls) {
  return CLASS_COLORS[cls] || '#00f5c4'; // Default teal
}

Step 6 — Floating Label Cards with Fun Facts

Instead of plain text labels on canvas, use absolutely-positioned HTML div elements with glassmorphism styling. They update position every frame to stay anchored to the bounding box.

JavaScript — label card system
const FUN_FACTS = {
  person:     { icon: '🧍', fact: 'Humans walk ~8,000 steps/day on average' },
  laptop:     { icon: '💻', fact: 'Try Claude Code for AI-assisted coding' },
  cell phone: { icon: '📱', fact: '84% of people check their phone within 15 min of waking' },
  chair:      { icon: '🪑', fact: 'Sitting for 8h/day is equivalent to smoking 1 pack' },
  cup:        { icon: '☕', fact: 'The average developer drinks 3.1 cups of coffee per day' },
  book:       { icon: '📚', fact: 'Reading 20 pages/day = 12 books per year' },
  car:        { icon: '🚗', fact: 'EVs have 20× fewer moving parts than petrol engines' },
  // Add more as needed...
};

function updateLabels(predictions) {
  // Clear old labels
  document.querySelectorAll('.label-card').forEach(el => el.remove());

  predictions.forEach(pred => {
    const [x, y, w] = pred.bbox;
    const info = FUN_FACTS[pred.class] || { icon: '🔍', fact: 'Detected by COCO-SSD' };
    const color = getColor(pred.class);
    const pct = Math.round(pred.score * 100);

    const card = document.createElement('div');
    card.className = 'label-card';
    card.style.cssText = `
      position: fixed;
      left: ${x}px; top: ${y - 72}px;
      background: rgba(5,8,16,0.75);
      backdrop-filter: blur(12px);
      border: 1px solid ${color}55;
      border-radius: 10px;
      padding: 8px 12px;
      font-family: -apple-system, sans-serif;
      font-size: 12px;
      color: #e8ecf5;
      z-index: 10;
      min-width: 140px;
      max-width: ${w}px;
      pointer-events: none;
    `;
    card.innerHTML = `
      <div style="display:flex;align-items:center;gap:6px;margin-bottom:4px">
        <span>${info.icon}</span>
        <strong style="color:${color}">${pred.class}</strong>
        <span style="margin-left:auto;color:#6b7a99;font-size:11px">${pct}%</span>
      </div>
      <div style="background:#1e2d4a;border-radius:4px;height:4px;margin-bottom:6px">
        <div style="background:${color};width:${pct}%;height:100%;border-radius:4px"></div>
      </div>
      <div style="font-size:10px;color:#6b7a99;line-height:1.5">${info.fact}</div>
    `;
    document.body.appendChild(card);
  });
}

Step 7 — Top HUD Navbar with FPS Counter

HTML — top HUD
<nav id="hud-top" style="
  position: fixed; top: 0; left: 0; right: 0; z-index: 50;
  display: flex; align-items: center; justify-content: space-between;
  padding: 12px 20px;
  background: rgba(5,8,16,0.6);
  backdrop-filter: blur(16px);
  border-bottom: 1px solid rgba(0,245,196,0.15);
  font-family: 'Orbitron', sans-serif;
">
  <div style="font-size:1.1rem;font-weight:900;letter-spacing:4px;color:#2ee6ff;
    text-shadow:0 0 12px rgba(46,230,255,0.6)">AR VISION</div>

  <div style="display:flex;align-items:center;gap:16px">
    <div id="fps-display" style="font-size:.75rem;color:#00f5c4">-- FPS</div>
    <button id="toggleBtn" onclick="toggleDetection()" style="
      padding:6px 16px; border-radius:8px;
      border:1px solid #00f5c4; background:transparent;
      color:#00f5c4; font-family:'Orbitron',sans-serif;
      font-size:.65rem; cursor:pointer; letter-spacing:1px;
    ">STOP</button>
    <button id="switchBtn" onclick="switchCamera()" style="
      padding:6px 12px; border-radius:8px;
      border:1px solid #7c6cf8; background:transparent;
      color:#7c6cf8; font-size:.75rem; cursor:pointer;
    ">🔄</button>
  </div>
</nav>
JavaScript — FPS update + toggle
function updateFPS(fps) {
  document.getElementById('fps-display').textContent = fps + ' FPS';
}

function toggleDetection() {
  isRunning = !isRunning;
  const btn = document.getElementById('toggleBtn');
  btn.textContent = isRunning ? 'STOP' : 'START';
  btn.style.borderColor = isRunning ? '#00f5c4' : '#ff6b6b';
  btn.style.color       = isRunning ? '#00f5c4' : '#ff6b6b';
  if (isRunning) detect();
}

Step 8 — Bottom HUD Panel & Detection History Sidebar

HTML + JavaScript — bottom HUD + history
<!-- Bottom HUD -->
<div id="hud-bottom" style="
  position:fixed;bottom:0;left:0;right:0;z-index:50;
  display:flex;align-items:center;gap:20px;padding:12px 20px;
  background:rgba(5,8,16,0.6);backdrop-filter:blur(16px);
  border-top:1px solid rgba(0,245,196,0.15);
  font-size:.72rem;color:#6b7a99;font-family:'Orbitron',sans-serif;
">
  <span>OBJECTS: <span id="objCount" style="color:#00f5c4">0</span></span>
  <span>TOP: <span id="topDetect" style="color:#fbbf24"></span></span>
  <span style="margin-left:auto">MODEL: <span id="modelStatus" style="color:#4ade80">READY</span></span>
</div>

<!-- Detection history sidebar -->
<div id="history" style="
  position:fixed;right:16px;top:80px;z-index:50;
  width:170px;display:flex;flex-direction:column;gap:8px;
"></div>

/* JS — update HUD + log history */
JavaScript — HUD + history logic
const historyLog = []; // { class, time }
const seenClasses = new Set();

function updateHUD(predictions) {
  document.getElementById('objCount').textContent = predictions.length;

  if (predictions.length > 0) {
    const top = predictions.reduce((a,b) => a.score > b.score ? a : b);
    document.getElementById('topDetect').textContent =
      top.class + ' ' + Math.round(top.score*100) + '%';

    // Log new unique objects to history
    predictions.forEach(p => {
      if (!seenClasses.has(p.class)) {
        seenClasses.add(p.class);
        historyLog.unshift({ cls: p.class, time: new Date().toLocaleTimeString() });
        if (historyLog.length > 5) historyLog.pop();
        renderHistory();
      }
    });
  }
}

function renderHistory() {
  const el = document.getElementById('history');
  el.innerHTML = historyLog.map(h => `
    <div style="background:rgba(5,8,16,0.7);backdrop-filter:blur(10px);
      border:1px solid rgba(0,245,196,0.2);border-radius:8px;padding:8px 10px;
      font-size:11px;color:#e8ecf5;font-family:'Orbitron',sans-serif">
      <div style="color:#00f5c4;font-size:10px">${h.cls}</div>
      <div style="color:#6b7a99;font-size:9px;margin-top:2px">${h.time}</div>
    </div>
  `).join('');
}

Step 9 — Camera Switch Button & Scan-Line Overlay

JavaScript — camera switch
async function switchCamera() {
  // Stop existing tracks
  const tracks = video.srcObject?..getTracks() || [];
  tracks.forEach(t => t.stop());

  // Toggle facing mode
  facingMode = facingMode === 'user'
    ? { ideal: 'environment' }
    : 'user';

  await startCamera();
}
CSS — scan-line overlay animation
/* Add this inside <style> */
#scanlines {
  position: fixed; inset: 0; z-index: 2;
  pointer-events: none;
  background: repeating-linear-gradient(
    0deg,
    transparent,
    transparent 2px,
    rgba(0, 245, 196, 0.015) 2px,
    rgba(0, 245, 196, 0.015) 4px
  );
  animation: scanMove 8s linear infinite;
}
@keyframes scanMove {
  from { background-position: 0 0; }
  to   { background-position: 0 100vh; }
}

Step 10 — Loading Screen & Error States

HTML + CSS — loading screen with progress ring
<div id="loader" style="
  position:fixed;inset:0;z-index:200;
  background:#04060a;display:flex;flex-direction:column;
  align-items:center;justify-content:center;gap:20px;
">
  <!-- Spinning ring -->
  <svg width="80" height="80" viewBox="0 0 80 80">
    <circle cx="40" cy="40" r="32" fill="none"
      stroke="#1e2d4a" stroke-width="4"/>
    <circle cx="40" cy="40" r="32" fill="none"
      stroke="#00f5c4" stroke-width="4"
      stroke-dasharray="50 150" stroke-linecap="round"
      style="animation:spin 1.2s linear infinite;transform-origin:center"/>
  </svg>
  <div id="loadStatus" style="color:#6b7a99;font-size:.8rem;
    font-family:'Orbitron',sans-serif;letter-spacing:2px">INITIALISING...</div>
</div>

<style>
@keyframes spin { to { transform: rotate(360deg); } }
</style>

/* JS: hide loader when model is ready */
function setStatus(msg) {
  document.getElementById('loadStatus').textContent = msg;
}
function hideLoader() {
  document.getElementById('loader').style.display = 'none';
}

The Exact Claude Prompt Used to Build This

This entire app was generated in a single prompt using Claude's design tool. Here is the complete prompt — you can use it yourself to reproduce or extend the app:

Build a single-file HTML AR Object Detector with a professional dark cyberpunk UI. Core functionality: - Access the device camera (rear-facing on mobile) using getUserMedia - Load TensorFlow.js and COCO-SSD model via CDN and run real-time object detection on the live camera feed - Draw animated bounding boxes over detected objects on a canvas overlay - Each bounding box should have a floating label card showing: object name, confidence %, and a fun fact or use-case tip (hardcoded per class) UI Design — dark, professional, glassmorphism: - Full-screen camera feed as background - Frosted glass top navbar with app title "AR Vision" and a live FPS counter - Animated neon-colored bounding boxes (different color per object class category — food = green, furniture = cyan, electronics = purple, people = amber, vehicles = red) - Floating label cards with backdrop blur, rounded corners, object icon (emoji), confidence bar - Bottom HUD panel showing: total objects detected, most confident detection, model status - Smooth scan-line animation overlay for a futuristic feel - "Start / Stop" toggle button with pulsing glow when active - Loading screen while model initializes with animated progress ring Technical: - Use @tensorflow-models/coco-ssd and @tensorflow/tfjs from CDN - requestAnimationFrame loop for smooth detection - Handle camera permission denial gracefully with a styled error state - Mobile-first: works on phone browser pointed at real objects - On desktop, default to front-facing webcam; on mobile, default to rear camera using facingMode: { ideal: "environment" } - Add a camera switch button (front/rear toggle) visible on mobile - Show a subtle "Scanning..." pulse animation on the bounding box while confidence is below 70% - Add a detection history sidebar (last 5 unique objects detected with timestamp)
💡 Try it yourself

Go to claude.ai, click the design/artifact mode, paste this prompt exactly, and Claude will generate the complete working file in one shot. You can then ask follow-up prompts to extend it — add more object classes, change the colour scheme, or add a screenshot button.

Frequently Asked Questions

Does this work on iPhone?
Yes — but you must open it in Safari, not Chrome or Firefox on iOS. Apple restricts camera access to Safari on iOS. Save the HTML file to your device, use a local server app (like Inspect Browser), or host it on any HTTPS URL. Once open in Safari on iPhone, tap Allow when prompted for camera access and the app works identically to desktop.
Why does the camera not work when I open the HTML file directly?
Browser security policies block getUserMedia on file:// URLs. You need either HTTPS or localhost. The quickest local server: open a terminal in the folder containing your HTML file and run python -m http.server 8080 (Python 3) then open http://localhost:8080 in your browser.
How accurate is COCO-SSD? What does it struggle with?
COCO-SSD (lite_mobilenet_v2 variant) achieves ~18–22% mAP on the COCO benchmark — solid for real-time use but not production-grade accuracy. It struggles with: objects smaller than ~80×80px, heavily occluded objects, unusual angles, and low-light conditions. It works best with common everyday objects held or placed clearly in the centre of frame. For higher accuracy at the cost of speed, swap to mobilenet_v2 or mobilenet_v1 in the model load call.
How do I add more object classes or custom objects?
COCO-SSD is pre-trained on 80 fixed classes and cannot be extended without retraining. For custom objects, you have two options: (1) Use Teachable Machine by Google — train a custom image classifier in your browser and export a TensorFlow.js model, (2) Use ml5.js which wraps a YOLO variant and supports custom training datasets. The bounding box and HUD code in this tutorial works with both alternatives with minimal changes.
Can I deploy this as a web app?
Yes — it's a single HTML file, so you can host it on GitHub Pages, Netlify, Vercel, Cloudflare Pages, or any static host. All of these provide free HTTPS out of the box. Just upload or drag the HTML file and your app is live within seconds. No build step, no Node.js, no dependencies to install.
How do I make the bounding boxes stay on the right objects as they move?
The requestAnimationFrame loop reruns detection and redraws everything every frame (~30fps), so boxes naturally track moving objects. If you notice boxes lagging, it's usually because the GPU is busy. Try reducing maxNumBoxes from 10 to 5, lowering the video resolution in the getUserMedia constraints, or switching to the lite_mobilenet_v2 model base if you're on mobilenet_v2. On mid-range phones, 15-20fps is typical and still feels smooth.
Can I record or screenshot the detected output?
Yes — since everything is drawn on a canvas, you can capture it with canvas.toDataURL('image/png') and trigger a download: const a = document.createElement('a'); a.href = canvas.toDataURL(); a.download = 'ar-vision-capture.png'; a.click(). For video recording, use the MediaRecorder API on a canvas.captureStream(30) output stream. Add a screenshot button to the HUD to make this one-tap.
Will it drain my phone battery fast?
Running the camera and GPU inference 30 times per second is power-hungry. On a modern phone, expect roughly 10–15% battery drain per 30 minutes of active use. To reduce this: lower the inference rate by replacing requestAnimationFrame with a setInterval at 500ms intervals (2fps inference is usually enough for stable objects), or add the Stop button to pause detection when you're not actively using it.
🎮 Live Demo
Try AR Vision right now

Point your phone camera at everyday objects — cups, laptops, chairs, people — and watch the AI detect them live with neon bounding boxes.

LAUNCH AR VISION

Opens in a new tab · Allow camera when prompted · Best on mobile

Build your own with Claude

The full AR Vision app was built with a single prompt in Claude's design tool. Use the prompt above, or ask Claude to extend it — add object counting, voice announcements, or screenshot capture.

Try Claude →