🔧 Tech Stack — What Powers AR Vision
Every library and API in this app is free, open-source, and runs entirely in the browser — no backend, no API key, no install. Here's the full breakdown:
🗺️ System Architecture Diagram
This interactive diagram shows how all the components of AR Vision connect — from the camera input through TensorFlow.js inference to the rendered output. Open it to explore the full data flow:
📱 Will This Work on Mobile?
Yes — it's built mobile-first. Here's what's specifically designed for phones, and the one important catch you need to know before sharing the file:
facingMode: { ideal: "environment" } on mobile so it immediately points at the worldviewport-fit=cover handles notchesMobile browsers only grant camera access on https:// or localhost. If you AirDrop or email the HTML file and tap it directly on your phone, the browser opens it as file:// — and you'll hit the styled "Camera Access Denied" screen.
To use on mobile — host it over HTTPS. Easiest free options:
What You'll Build
AR Vision is a complete browser-based augmented reality object detector. Point your phone camera at any object and it identifies it in real time — drawing colour-coded neon bounding boxes, showing confidence scores, and logging detection history. It uses COCO-SSD, a pre-trained model that recognises 80 everyday object classes (people, chairs, phones, cups, dogs, cars, and more).
How It Works — TensorFlow.js + COCO-SSD Explained
TensorFlow.js is Google's machine learning library ported to run entirely in the browser via WebGL and WebAssembly. No Python, no server, no API key — the model runs on your device's GPU (or CPU as fallback) directly inside a <script> tag.
COCO-SSD (Common Objects in Context — Single Shot MultiBox Detector) is a pre-trained object detection model. "Single Shot" means it detects all objects in one forward pass through the network — making it fast enough for real-time use. It outputs an array of detections, each with a class, score (0–1 confidence), and bbox (bounding box coordinates).
Video element → Canvas capture → COCO-SSD.detect() → Array of detections → Draw boxes + labels on canvas overlay → requestAnimationFrame → repeat. The whole cycle runs every ~33ms at 30fps.
Step 1 — HTML Skeleton & CDN Imports
The entire app is a single HTML file. Start with the base structure and load TensorFlow.js plus the COCO-SSD model from CDN:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>AR Vision — Object Detector</title> <!-- TensorFlow.js core --> <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@4.22.0/dist/tf.min.js"></script> <!-- COCO-SSD object detection model --> <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/coco-ssd@2.2.3/dist/coco-ssd.min.js"></script> <!-- Google Font: Orbitron for the cyberpunk feel --> <link href="https://fonts.googleapis.com/css2?family=Orbitron:wght@700;900&display=swap" rel="stylesheet"> </head> <body> <!-- All content goes here --> </body> </html>
jsDelivr is a free, fast CDN that serves npm packages — no account needed. The version pins (@4.22.0) prevent breaking changes from auto-updating under you.
Step 2 — Camera Access with getUserMedia
Add a <video> element (hidden) and a <canvas> overlay (visible). The video streams camera feed; the canvas draws detection results on top.
<!-- Hidden video element for camera stream --> <video id="video" autoplay muted playsinline style="position:fixed;inset:0;width:100%;height:100%;object-fit:cover;z-index:0"></video> <!-- Canvas overlay for bounding boxes --> <canvas id="canvas" style="position:fixed;inset:0;width:100%;height:100%;z-index:1;pointer-events:none"></canvas>
const video = document.getElementById('video'); const canvas = document.getElementById('canvas'); const ctx = canvas.getContext('2d'); // Detect mobile for rear camera default const isMobile = /Mobi|Android/i.test(navigator.userAgent); let facingMode = isMobile ? { ideal: 'environment' } : 'user'; async function startCamera() { try { const stream = await navigator.mediaDevices.getUserMedia({ video: { facingMode, width: { ideal: 1280 }, height: { ideal: 720 } }, audio: false }); video.srcObject = stream; await video.play(); // Match canvas size to video canvas.width = video.videoWidth; canvas.height = video.videoHeight; } catch (err) { showError(err); // Styled error state } }
Camera access via getUserMedia only works on HTTPS (or localhost). If you open the HTML file directly from your file system (file://), camera access will be blocked. Serve it with a local server or deploy to any host with SSL.
Step 3 — Load COCO-SSD and Run the Detection Loop
Load the model once, then run it every animation frame. The model's detect() method accepts the video element directly and returns an array of prediction objects.
let model, isRunning = false; let lastTime = performance.now(), fps = 0; async function loadModel() { setStatus('Loading model...'); model = await cocoSsd.load({ base: 'lite_mobilenet_v2' // Fastest variant for real-time }); setStatus('Model ready ✓'); } async function detect() { if (!isRunning || !model) return; // Calculate FPS const now = performance.now(); fps = Math.round(1000 / (now - lastTime)); lastTime = now; updateFPS(fps); // Sync canvas to video dimensions canvas.width = video.videoWidth; canvas.height = video.videoHeight; // Run inference const predictions = await model.detect(video, { maxNumBoxes: 10, // Max objects per frame minScore: 0.4 // Filter low-confidence noise }); ctx.clearRect(0, 0, canvas.width, canvas.height); predictions.forEach(pred => drawBox(pred)); updateHUD(predictions); requestAnimationFrame(detect); }
Step 4 — Draw Neon Bounding Boxes on Canvas
Each prediction has a bbox array: [x, y, width, height] in pixels. Draw a glowing rectangle using shadowBlur for the neon effect:
function drawBox(pred) { const [x, y, w, h] = pred.bbox; const color = getColor(pred.class); const isLowConf = pred.score < 0.7; // Neon glow effect ctx.shadowBlur = 20; ctx.shadowColor = color; ctx.strokeStyle = color; ctx.lineWidth = isLowConf ? 1.5 : 2.5; // Dashed border for low confidence ("Scanning...") if (isLowConf) ctx.setLineDash([6, 4]); else ctx.setLineDash([]); ctx.strokeRect(x, y, w, h); // Corner bracket accents (cyberpunk style) const s = 18; ctx.lineWidth = 3; [[x,y],[x+w,y],[x,y+h],[x+w,y+h]].forEach(([cx,cy], i) => { ctx.beginPath(); ctx.moveTo(cx + (i%2===0 ? s : -s), cy); ctx.lineTo(cx, cy); ctx.lineTo(cx, cy + (i<2 ? s : -s)); ctx.stroke(); }); ctx.shadowBlur = 0; }
Step 5 — Colour-Code by Object Class Category
Map each COCO class name to a category, then assign a neon colour per category:
const CLASS_COLORS = { // People person: '#fbbf24', // Food banana: '#4ade80', apple: '#4ade80', orange: '#4ade80', sandwich: '#4ade80', pizza: '#4ade80', cake: '#4ade80', // Furniture chair: '#22d3ee', couch: '#22d3ee', bed: '#22d3ee', // Electronics laptop: '#a78bfa', cell phone: '#a78bfa', tv: '#a78bfa', keyboard: '#a78bfa', mouse: '#a78bfa', // Vehicles car: '#f87171', bus: '#f87171', truck: '#f87171', bicycle: '#f87171', motorcycle: '#f87171', }; function getColor(cls) { return CLASS_COLORS[cls] || '#00f5c4'; // Default teal }
Step 6 — Floating Label Cards with Fun Facts
Instead of plain text labels on canvas, use absolutely-positioned HTML div elements with glassmorphism styling. They update position every frame to stay anchored to the bounding box.
const FUN_FACTS = { person: { icon: '🧍', fact: 'Humans walk ~8,000 steps/day on average' }, laptop: { icon: '💻', fact: 'Try Claude Code for AI-assisted coding' }, cell phone: { icon: '📱', fact: '84% of people check their phone within 15 min of waking' }, chair: { icon: '🪑', fact: 'Sitting for 8h/day is equivalent to smoking 1 pack' }, cup: { icon: '☕', fact: 'The average developer drinks 3.1 cups of coffee per day' }, book: { icon: '📚', fact: 'Reading 20 pages/day = 12 books per year' }, car: { icon: '🚗', fact: 'EVs have 20× fewer moving parts than petrol engines' }, // Add more as needed... }; function updateLabels(predictions) { // Clear old labels document.querySelectorAll('.label-card').forEach(el => el.remove()); predictions.forEach(pred => { const [x, y, w] = pred.bbox; const info = FUN_FACTS[pred.class] || { icon: '🔍', fact: 'Detected by COCO-SSD' }; const color = getColor(pred.class); const pct = Math.round(pred.score * 100); const card = document.createElement('div'); card.className = 'label-card'; card.style.cssText = ` position: fixed; left: ${x}px; top: ${y - 72}px; background: rgba(5,8,16,0.75); backdrop-filter: blur(12px); border: 1px solid ${color}55; border-radius: 10px; padding: 8px 12px; font-family: -apple-system, sans-serif; font-size: 12px; color: #e8ecf5; z-index: 10; min-width: 140px; max-width: ${w}px; pointer-events: none; `; card.innerHTML = ` <div style="display:flex;align-items:center;gap:6px;margin-bottom:4px"> <span>${info.icon}</span> <strong style="color:${color}">${pred.class}</strong> <span style="margin-left:auto;color:#6b7a99;font-size:11px">${pct}%</span> </div> <div style="background:#1e2d4a;border-radius:4px;height:4px;margin-bottom:6px"> <div style="background:${color};width:${pct}%;height:100%;border-radius:4px"></div> </div> <div style="font-size:10px;color:#6b7a99;line-height:1.5">${info.fact}</div> `; document.body.appendChild(card); }); }
Step 7 — Top HUD Navbar with FPS Counter
<nav id="hud-top" style=" position: fixed; top: 0; left: 0; right: 0; z-index: 50; display: flex; align-items: center; justify-content: space-between; padding: 12px 20px; background: rgba(5,8,16,0.6); backdrop-filter: blur(16px); border-bottom: 1px solid rgba(0,245,196,0.15); font-family: 'Orbitron', sans-serif; "> <div style="font-size:1.1rem;font-weight:900;letter-spacing:4px;color:#2ee6ff; text-shadow:0 0 12px rgba(46,230,255,0.6)">AR VISION</div> <div style="display:flex;align-items:center;gap:16px"> <div id="fps-display" style="font-size:.75rem;color:#00f5c4">-- FPS</div> <button id="toggleBtn" onclick="toggleDetection()" style=" padding:6px 16px; border-radius:8px; border:1px solid #00f5c4; background:transparent; color:#00f5c4; font-family:'Orbitron',sans-serif; font-size:.65rem; cursor:pointer; letter-spacing:1px; ">STOP</button> <button id="switchBtn" onclick="switchCamera()" style=" padding:6px 12px; border-radius:8px; border:1px solid #7c6cf8; background:transparent; color:#7c6cf8; font-size:.75rem; cursor:pointer; ">🔄</button> </div> </nav>
function updateFPS(fps) { document.getElementById('fps-display').textContent = fps + ' FPS'; } function toggleDetection() { isRunning = !isRunning; const btn = document.getElementById('toggleBtn'); btn.textContent = isRunning ? 'STOP' : 'START'; btn.style.borderColor = isRunning ? '#00f5c4' : '#ff6b6b'; btn.style.color = isRunning ? '#00f5c4' : '#ff6b6b'; if (isRunning) detect(); }
Step 8 — Bottom HUD Panel & Detection History Sidebar
<!-- Bottom HUD --> <div id="hud-bottom" style=" position:fixed;bottom:0;left:0;right:0;z-index:50; display:flex;align-items:center;gap:20px;padding:12px 20px; background:rgba(5,8,16,0.6);backdrop-filter:blur(16px); border-top:1px solid rgba(0,245,196,0.15); font-size:.72rem;color:#6b7a99;font-family:'Orbitron',sans-serif; "> <span>OBJECTS: <span id="objCount" style="color:#00f5c4">0</span></span> <span>TOP: <span id="topDetect" style="color:#fbbf24">—</span></span> <span style="margin-left:auto">MODEL: <span id="modelStatus" style="color:#4ade80">READY</span></span> </div> <!-- Detection history sidebar --> <div id="history" style=" position:fixed;right:16px;top:80px;z-index:50; width:170px;display:flex;flex-direction:column;gap:8px; "></div> /* JS — update HUD + log history */
const historyLog = []; // { class, time } const seenClasses = new Set(); function updateHUD(predictions) { document.getElementById('objCount').textContent = predictions.length; if (predictions.length > 0) { const top = predictions.reduce((a,b) => a.score > b.score ? a : b); document.getElementById('topDetect').textContent = top.class + ' ' + Math.round(top.score*100) + '%'; // Log new unique objects to history predictions.forEach(p => { if (!seenClasses.has(p.class)) { seenClasses.add(p.class); historyLog.unshift({ cls: p.class, time: new Date().toLocaleTimeString() }); if (historyLog.length > 5) historyLog.pop(); renderHistory(); } }); } } function renderHistory() { const el = document.getElementById('history'); el.innerHTML = historyLog.map(h => ` <div style="background:rgba(5,8,16,0.7);backdrop-filter:blur(10px); border:1px solid rgba(0,245,196,0.2);border-radius:8px;padding:8px 10px; font-size:11px;color:#e8ecf5;font-family:'Orbitron',sans-serif"> <div style="color:#00f5c4;font-size:10px">${h.cls}</div> <div style="color:#6b7a99;font-size:9px;margin-top:2px">${h.time}</div> </div> `).join(''); }
Step 9 — Camera Switch Button & Scan-Line Overlay
async function switchCamera() { // Stop existing tracks const tracks = video.srcObject?..getTracks() || []; tracks.forEach(t => t.stop()); // Toggle facing mode facingMode = facingMode === 'user' ? { ideal: 'environment' } : 'user'; await startCamera(); }
/* Add this inside <style> */
#scanlines {
position: fixed; inset: 0; z-index: 2;
pointer-events: none;
background: repeating-linear-gradient(
0deg,
transparent,
transparent 2px,
rgba(0, 245, 196, 0.015) 2px,
rgba(0, 245, 196, 0.015) 4px
);
animation: scanMove 8s linear infinite;
}
@keyframes scanMove {
from { background-position: 0 0; }
to { background-position: 0 100vh; }
}
Step 10 — Loading Screen & Error States
<div id="loader" style=" position:fixed;inset:0;z-index:200; background:#04060a;display:flex;flex-direction:column; align-items:center;justify-content:center;gap:20px; "> <!-- Spinning ring --> <svg width="80" height="80" viewBox="0 0 80 80"> <circle cx="40" cy="40" r="32" fill="none" stroke="#1e2d4a" stroke-width="4"/> <circle cx="40" cy="40" r="32" fill="none" stroke="#00f5c4" stroke-width="4" stroke-dasharray="50 150" stroke-linecap="round" style="animation:spin 1.2s linear infinite;transform-origin:center"/> </svg> <div id="loadStatus" style="color:#6b7a99;font-size:.8rem; font-family:'Orbitron',sans-serif;letter-spacing:2px">INITIALISING...</div> </div> <style> @keyframes spin { to { transform: rotate(360deg); } } </style> /* JS: hide loader when model is ready */ function setStatus(msg) { document.getElementById('loadStatus').textContent = msg; } function hideLoader() { document.getElementById('loader').style.display = 'none'; }
The Exact Claude Prompt Used to Build This
This entire app was generated in a single prompt using Claude's design tool. Here is the complete prompt — you can use it yourself to reproduce or extend the app:
Go to claude.ai, click the design/artifact mode, paste this prompt exactly, and Claude will generate the complete working file in one shot. You can then ask follow-up prompts to extend it — add more object classes, change the colour scheme, or add a screenshot button.
Frequently Asked Questions
Does this work on iPhone?
Why does the camera not work when I open the HTML file directly?
file:// URLs. You need either HTTPS or localhost. The quickest local server: open a terminal in the folder containing your HTML file and run python -m http.server 8080 (Python 3) then open http://localhost:8080 in your browser.How accurate is COCO-SSD? What does it struggle with?
lite_mobilenet_v2 variant) achieves ~18–22% mAP on the COCO benchmark — solid for real-time use but not production-grade accuracy. It struggles with: objects smaller than ~80×80px, heavily occluded objects, unusual angles, and low-light conditions. It works best with common everyday objects held or placed clearly in the centre of frame. For higher accuracy at the cost of speed, swap to mobilenet_v2 or mobilenet_v1 in the model load call.How do I add more object classes or custom objects?
Can I deploy this as a web app?
How do I make the bounding boxes stay on the right objects as they move?
requestAnimationFrame loop reruns detection and redraws everything every frame (~30fps), so boxes naturally track moving objects. If you notice boxes lagging, it's usually because the GPU is busy. Try reducing maxNumBoxes from 10 to 5, lowering the video resolution in the getUserMedia constraints, or switching to the lite_mobilenet_v2 model base if you're on mobilenet_v2. On mid-range phones, 15-20fps is typical and still feels smooth.Can I record or screenshot the detected output?
canvas.toDataURL('image/png') and trigger a download: const a = document.createElement('a'); a.href = canvas.toDataURL(); a.download = 'ar-vision-capture.png'; a.click(). For video recording, use the MediaRecorder API on a canvas.captureStream(30) output stream. Add a screenshot button to the HUD to make this one-tap.Will it drain my phone battery fast?
requestAnimationFrame with a setInterval at 500ms intervals (2fps inference is usually enough for stable objects), or add the Stop button to pause detection when you're not actively using it.Build your own with Claude
The full AR Vision app was built with a single prompt in Claude's design tool. Use the prompt above, or ask Claude to extend it — add object counting, voice announcements, or screenshot capture.
Try Claude →