← Back to AR History Series

Zigfu + Kinect — Full Body Gesture Control Before Spatial Computing

Prabhu Kumar Dasari — Senior Unity XR Developer
Prabhu Kumar Dasari
Senior Unity XR / VR / AR Developer · 13+ Years
Built Kinect + Zigfu gesture projects for enterprise clients · UAE · GITEX Dubai 2024
Before spatial computing was a term anyone used, before hand tracking was built into headsets, before your body position could be detected by a phone camera — Microsoft Kinect and Zigfu made full body skeleton tracking available to developers. Your entire body became the controller. I worked on Kinect-based projects in this era, building gesture-controlled experiences for enterprise clients — including a gamified engagement experience for a leading bank in the UAE where players physically jumped and crouched to interact. During a showcase of this technology, a visiting client from Australia asked a question I have never forgotten. It was years ahead of everything we were building.

What Microsoft Kinect Was

Microsoft Kinect launched in November 2010 as an accessory for the Xbox 360 — a sensor bar containing an RGB camera, an infrared depth sensor, and a microphone array. Its purpose was simple: let players control Xbox games using their body and voice, without holding any controller.

The Kinect's depth sensor emitted infrared dots and measured how they reflected back — building a real-time depth map of everything in front of it. Combined with computer vision algorithms, this depth data allowed the Kinect to identify human figures and track the positions of up to 20 skeletal joints in real time. Stand in front of it and it knew where your head was, where your hands were, how your spine was oriented, whether your knees were bent, whether your arms were raised.

For game developers, this was revolutionary. For AR and interactive experience developers, it opened an entirely different category of interaction design — one where the user's body was not a proxy but the direct input mechanism.

What Zigfu Was — The Bridge to Web and Unity

Microsoft's Kinect SDK was powerful but limited to Windows applications. Zigfu was a middleware layer that brought Kinect skeleton tracking to web browsers and — critically for many developers — to Unity. Zigfu abstracted the complexity of the Kinect SDK, exposed skeleton joint positions through a clean API, and made it accessible to developers who were building in Unity rather than native Windows applications.

With Zigfu and Unity connected to a Kinect sensor, you could access the position of any of the 20 tracked joints as a 3D coordinate updated every frame. You could write game logic that responded to the player's actual body position — if the right hand joint rose above the head joint, that was a "raise hand" gesture. If the hip joints dropped significantly, that was a crouch. If the player moved their entire body to the left, the character moved left.

🦴 The 20 Kinect Skeleton Joints
Head
Neck / Shoulders
Left / Right Elbow
Left / Right Wrist
Left / Right Hand
Spine / Hip Centre
Left / Right Hip
Left / Right Knee
Left / Right Ankle
Left / Right Foot

The Projects — Real Enterprise Experiences Built on Kinect

🎮 Project 1 — Enterprise Gamification
Service Icon Collection Game — Leading Bank, UAE
ClientLeading bank, UAE
HardwareMicrosoft Kinect + PC
ContextCustomer engagement / showcase
InputFull body — arms and hands

The concept was a gamified engagement experience built around the bank's own services. The player stood in front of the Kinect sensor and saw themselves — or a representation of themselves — on screen. Icons representing the bank's different services and products fell from the top of the screen. The player had to physically reach out, move their arms, and collect the falling icons before they disappeared.

Each collected icon added to the player's score and could trigger information about that particular service. The game mechanics were simple by design — accessible to anyone who walked up to the installation — but the physical engagement created a memorable experience. Reaching out with your actual arm to catch a virtual banking icon is a very different interaction from tapping a touchscreen. The Kinect skeleton tracking made it possible.

🎮 Project 2 — Body Obstacle Game
Jump and Crouch Obstacle Experience
HardwareMicrosoft Kinect + PC
ContextInteractive showcase
InputFull body — jump, crouch, dodge
MechanicAvoid incoming objects

The second experience was a more physically demanding obstacle game. Objects came towards the player on screen — some high, some low, some from the sides. The player had to physically jump to avoid low obstacles, crouch to go under high ones, and lean left or right to dodge objects coming from the sides. The Kinect tracked the player's full skeleton in real time and translated their actual body movements directly into game actions.

↑ Jump
Both hip joints rise above threshold — jump detected
↓ Crouch
Hip joints drop significantly — crouch detected
← Lean Left
Spine centre shifts left relative to hip baseline
→ Lean Right
Spine centre shifts right relative to hip baseline

The Question That Stayed With Me

During one of the showcases where these Kinect-based experiences were being demonstrated to potential clients and agency representatives, a visitor from Australia watched the technology carefully. After seeing the body tracking in action — how the sensor understood where the person was, how they were moving, what their body was doing in space — he asked a question that had nothing to do with gaming or marketing.

"Can you build something for blind people — so when they are walking in traffic, the sensors detect what is in front of them and tell them to move left, move right, or stop?"
The concept: use depth sensing and spatial awareness technology — the same technology powering these games — to give a blind person real-time navigational feedback through their environment. Sensors detect obstacles, pedestrians, vehicles. The system communicates through audio or haptic feedback: move left, obstacle ahead, stop.
— A client from Australia, during a Kinect showcase in the UAE

I have never forgotten that question. At the time, we were focused entirely on entertainment and enterprise engagement — games, marketing activations, customer experience demos. The technology we were working with was genuinely capable in its domain. But this client immediately saw past the game mechanics to a completely different application: using the same spatial sensing capability to give independence and safety to a person who could not see.

In 2013 or 2014, that question felt like science fiction. The Kinect was a gaming peripheral. Zigfu was a developer tool for interactive installations. The idea of miniaturising this into something a person could wear and use while walking through traffic was beyond what any of us were thinking about. But the underlying concept — depth sensing, obstacle detection, real-time spatial feedback — was exactly right. That client was simply thinking at a different scale and for a different purpose than everyone else in the room.

Where That Vision Eventually Went

The technology that client imagined in that showcase room has since become real — built by different teams, using different hardware, but based on the same fundamental principles of spatial sensing and real-time feedback.

📱
Microsoft Seeing AI
2017 → Present
Microsoft's app uses the phone camera and AI to narrate the world for visually impaired users — describing people, text, products, scenes, and obstacles in real time through audio.
🔍
Google Lookout
2019 → Present
Google's accessibility app uses computer vision to identify objects, people, text, and navigation cues and communicates them through audio — designed specifically for blind and low-vision users.
🦺
Haptic Navigation Vests
2018 → Present
Wearable vests with haptic actuators that vibrate to indicate direction — left shoulder vibrates for turn left, right for turn right. Combined with sensors and GPS, they guide blind users through physical navigation.
🕶️
Smart Glasses for Blind
2020 → Present
Wearable glasses with embedded cameras and AI that describe the environment through an earpiece — identifying obstacles, reading signs, recognising faces, and providing turn-by-turn navigation.
🤖
AI + Depth Sensing
2022 → Present
Modern smartphones with LiDAR sensors — the direct successor to Kinect's depth sensing approach — now power accessibility features that detect obstacles and provide spatial audio cues for navigation.
🌐
Spatial Audio Navigation
2023 → Present
3D spatial audio systems that place directional sound cues in a blind person's environment — a sound to the left means turn left, a sound ahead means proceed, volume changes indicate distance to obstacles.

Every one of these technologies is doing what that client described in a showcase room years ago. The vision was correct. The hardware was just not small enough, cheap enough, or power-efficient enough yet. It took a decade of miniaturisation, AI progress, and smartphone hardware advancement to bring the idea to reality.

Why Kinect Was Discontinued — and What Replaced It

Microsoft discontinued the Kinect for Windows in 2017, citing declining sales and the difficulty of maintaining a peripheral that required a separate power source and USB connection in an era moving toward wireless and mobile. The Xbox One S removed the Kinect port entirely.

The technology did not disappear. Microsoft's Azure Kinect — a successor product aimed at developers and enterprise — launched in 2019 with significantly improved sensors. The body tracking algorithms Kinect pioneered are now standard capabilities in smartphone cameras, wearable devices, and game consoles. Sony's PlayStation cameras, Apple's Face ID infrared dot projector, the depth sensors in modern Android phones — all of these are spiritual successors to what Kinect was doing with its depth camera in 2010.

Zigfu followed a similar path — the middleware layer became unnecessary as the underlying capabilities moved into operating systems and major game engines as standard features. Unity's ML-Agents and body tracking integrations, Apple's ARKit body detection, Google's MediaPipe pose estimation — these are what Zigfu was bridging toward.

Microsoft Kinect Zigfu SDK Unity 3D Skeleton Joint Tracking Depth Sensing Gesture Recognition C# Scripting Interactive Installation
💬 Developer Reflection — Prabhu Kumar Dasari, 13+ Years in XR

That question from the Australian client during the showcase has stayed with me for over a decade. At the time I had no answer — we were not building accessibility technology, the hardware was not suitable for it, and the idea of a wearable real-time navigation system for blind people felt impossibly far from what we were doing. But that client was not thinking about what the technology was being used for. He was thinking about what it could be used for. That distinction — between the application in front of you and the potential beyond it — is something I try to carry into every project I work on now. The most important question about any new technology is rarely the obvious one. It is usually the one that someone in the room asks that makes everyone else go quiet for a moment.

Frequently Asked Questions

What was Zigfu and why did developers use it?

Zigfu was a middleware SDK that exposed Microsoft Kinect's skeleton tracking data to web browsers and Unity applications, making it accessible to a much wider range of developers than Microsoft's native Windows-only Kinect SDK. It provided clean APIs for joint position data, gesture detection, and player tracking, allowing Unity developers to build Kinect-powered experiences without deep knowledge of the underlying Kinect SDK. Zigfu was discontinued after Kinect itself was discontinued and its functionality was absorbed into native game engine features.

How many skeleton joints did Microsoft Kinect track?

The original Kinect for Xbox 360 tracked 20 skeleton joints, covering the full body from head to feet. The Kinect 2 for Xbox One improved this to 25 joints and added better hand and finger tracking. The Azure Kinect, released in 2019, tracked 32 body joints with significantly higher accuracy. Modern smartphone-based body pose estimation using frameworks like MediaPipe can track similar numbers of landmarks without any dedicated depth hardware.

Why did Microsoft discontinue the Kinect?

Microsoft officially cited declining sales when discontinuing the Kinect for Windows in 2017. The broader context was a shift in the industry: the capabilities that had made Kinect revolutionary in 2010 — depth sensing, body tracking, gesture recognition — were being replicated by smartphone cameras, game controller innovations, and software-based body tracking that required no dedicated hardware. The Kinect's requirement for a separate USB connection and power source made it increasingly difficult to justify as the capabilities it offered became available through simpler means.

Is body tracking still used in enterprise and AR today?

Yes — extensively. Body tracking is now a standard capability in modern XR development, available through ARKit body detection on iOS, MediaPipe Pose on Android and web, and dedicated XR headsets like Meta Quest which track hand and body position natively. Enterprise applications include physical training and rehabilitation, industrial ergonomics monitoring, fitness applications, retail interactive installations, and increasingly, spatial computing experiences where the user's body position and gesture are part of the interface. The difference from the Kinect era is that none of this requires dedicated external hardware.

🧍 Next

The "Celebrity Beside You" AR Trend

🥽 Series Pillar

How AR Evolved — Full Developer Perspective