What Microsoft Kinect Was
Microsoft Kinect launched in November 2010 as an accessory for the Xbox 360 — a sensor bar containing an RGB camera, an infrared depth sensor, and a microphone array. Its purpose was simple: let players control Xbox games using their body and voice, without holding any controller.
The Kinect's depth sensor emitted infrared dots and measured how they reflected back — building a real-time depth map of everything in front of it. Combined with computer vision algorithms, this depth data allowed the Kinect to identify human figures and track the positions of up to 20 skeletal joints in real time. Stand in front of it and it knew where your head was, where your hands were, how your spine was oriented, whether your knees were bent, whether your arms were raised.
For game developers, this was revolutionary. For AR and interactive experience developers, it opened an entirely different category of interaction design — one where the user's body was not a proxy but the direct input mechanism.
What Zigfu Was — The Bridge to Web and Unity
Microsoft's Kinect SDK was powerful but limited to Windows applications. Zigfu was a middleware layer that brought Kinect skeleton tracking to web browsers and — critically for many developers — to Unity. Zigfu abstracted the complexity of the Kinect SDK, exposed skeleton joint positions through a clean API, and made it accessible to developers who were building in Unity rather than native Windows applications.
With Zigfu and Unity connected to a Kinect sensor, you could access the position of any of the 20 tracked joints as a 3D coordinate updated every frame. You could write game logic that responded to the player's actual body position — if the right hand joint rose above the head joint, that was a "raise hand" gesture. If the hip joints dropped significantly, that was a crouch. If the player moved their entire body to the left, the character moved left.
The Projects — Real Enterprise Experiences Built on Kinect
The concept was a gamified engagement experience built around the bank's own services. The player stood in front of the Kinect sensor and saw themselves — or a representation of themselves — on screen. Icons representing the bank's different services and products fell from the top of the screen. The player had to physically reach out, move their arms, and collect the falling icons before they disappeared.
Each collected icon added to the player's score and could trigger information about that particular service. The game mechanics were simple by design — accessible to anyone who walked up to the installation — but the physical engagement created a memorable experience. Reaching out with your actual arm to catch a virtual banking icon is a very different interaction from tapping a touchscreen. The Kinect skeleton tracking made it possible.
The second experience was a more physically demanding obstacle game. Objects came towards the player on screen — some high, some low, some from the sides. The player had to physically jump to avoid low obstacles, crouch to go under high ones, and lean left or right to dodge objects coming from the sides. The Kinect tracked the player's full skeleton in real time and translated their actual body movements directly into game actions.
The Question That Stayed With Me
During one of the showcases where these Kinect-based experiences were being demonstrated to potential clients and agency representatives, a visitor from Australia watched the technology carefully. After seeing the body tracking in action — how the sensor understood where the person was, how they were moving, what their body was doing in space — he asked a question that had nothing to do with gaming or marketing.
I have never forgotten that question. At the time, we were focused entirely on entertainment and enterprise engagement — games, marketing activations, customer experience demos. The technology we were working with was genuinely capable in its domain. But this client immediately saw past the game mechanics to a completely different application: using the same spatial sensing capability to give independence and safety to a person who could not see.
In 2013 or 2014, that question felt like science fiction. The Kinect was a gaming peripheral. Zigfu was a developer tool for interactive installations. The idea of miniaturising this into something a person could wear and use while walking through traffic was beyond what any of us were thinking about. But the underlying concept — depth sensing, obstacle detection, real-time spatial feedback — was exactly right. That client was simply thinking at a different scale and for a different purpose than everyone else in the room.
Where That Vision Eventually Went
The technology that client imagined in that showcase room has since become real — built by different teams, using different hardware, but based on the same fundamental principles of spatial sensing and real-time feedback.
Every one of these technologies is doing what that client described in a showcase room years ago. The vision was correct. The hardware was just not small enough, cheap enough, or power-efficient enough yet. It took a decade of miniaturisation, AI progress, and smartphone hardware advancement to bring the idea to reality.
Why Kinect Was Discontinued — and What Replaced It
Microsoft discontinued the Kinect for Windows in 2017, citing declining sales and the difficulty of maintaining a peripheral that required a separate power source and USB connection in an era moving toward wireless and mobile. The Xbox One S removed the Kinect port entirely.
The technology did not disappear. Microsoft's Azure Kinect — a successor product aimed at developers and enterprise — launched in 2019 with significantly improved sensors. The body tracking algorithms Kinect pioneered are now standard capabilities in smartphone cameras, wearable devices, and game consoles. Sony's PlayStation cameras, Apple's Face ID infrared dot projector, the depth sensors in modern Android phones — all of these are spiritual successors to what Kinect was doing with its depth camera in 2010.
Zigfu followed a similar path — the middleware layer became unnecessary as the underlying capabilities moved into operating systems and major game engines as standard features. Unity's ML-Agents and body tracking integrations, Apple's ARKit body detection, Google's MediaPipe pose estimation — these are what Zigfu was bridging toward.
That question from the Australian client during the showcase has stayed with me for over a decade. At the time I had no answer — we were not building accessibility technology, the hardware was not suitable for it, and the idea of a wearable real-time navigation system for blind people felt impossibly far from what we were doing. But that client was not thinking about what the technology was being used for. He was thinking about what it could be used for. That distinction — between the application in front of you and the potential beyond it — is something I try to carry into every project I work on now. The most important question about any new technology is rarely the obvious one. It is usually the one that someone in the room asks that makes everyone else go quiet for a moment.
Frequently Asked Questions
What was Zigfu and why did developers use it?
Zigfu was a middleware SDK that exposed Microsoft Kinect's skeleton tracking data to web browsers and Unity applications, making it accessible to a much wider range of developers than Microsoft's native Windows-only Kinect SDK. It provided clean APIs for joint position data, gesture detection, and player tracking, allowing Unity developers to build Kinect-powered experiences without deep knowledge of the underlying Kinect SDK. Zigfu was discontinued after Kinect itself was discontinued and its functionality was absorbed into native game engine features.
How many skeleton joints did Microsoft Kinect track?
The original Kinect for Xbox 360 tracked 20 skeleton joints, covering the full body from head to feet. The Kinect 2 for Xbox One improved this to 25 joints and added better hand and finger tracking. The Azure Kinect, released in 2019, tracked 32 body joints with significantly higher accuracy. Modern smartphone-based body pose estimation using frameworks like MediaPipe can track similar numbers of landmarks without any dedicated depth hardware.
Why did Microsoft discontinue the Kinect?
Microsoft officially cited declining sales when discontinuing the Kinect for Windows in 2017. The broader context was a shift in the industry: the capabilities that had made Kinect revolutionary in 2010 — depth sensing, body tracking, gesture recognition — were being replicated by smartphone cameras, game controller innovations, and software-based body tracking that required no dedicated hardware. The Kinect's requirement for a separate USB connection and power source made it increasingly difficult to justify as the capabilities it offered became available through simpler means.
Is body tracking still used in enterprise and AR today?
Yes — extensively. Body tracking is now a standard capability in modern XR development, available through ARKit body detection on iOS, MediaPipe Pose on Android and web, and dedicated XR headsets like Meta Quest which track hand and body position natively. Enterprise applications include physical training and rehabilitation, industrial ergonomics monitoring, fitness applications, retail interactive installations, and increasingly, spatial computing experiences where the user's body position and gesture are part of the interface. The difference from the Kinect era is that none of this requires dedicated external hardware.