🔧 Tools 📰 Blog 🥽 XR Hub
← Back to XR Hub
🎓 Enterprise VR · 2020–2021

VR Learning Studio — AI-Powered Enterprise Communication Training Platform

A multimodal AI platform that evaluates speech, behaviour, and storytelling in VR — and delivers automated performance reports. Built end-to-end for a global technology consultancy.

Client
Global Technology Consultancy
Role
Lead Developer
Period
2020–2021
Platforms
VR (Standalone + PC) + Mobile (Android + iOS)
Type
Internal Enterprise Platform
Deployment
Multi-department, enterprise scale
6
AI Services Integrated
2
Platforms — VR + Mobile
5
AI Pipeline Stages
Auto
PDF Report Delivery

What Was Built

The VR Learning Studio is an end-to-end AI-powered enterprise training platform that evaluates how professionals communicate — their speech, their behaviour in front of a virtual audience, and the quality of their data storytelling. It runs on VR headsets and mobile devices, uses five AI services in a sequential processing pipeline, and required building every layer from scratch: OAuth authentication, VR experience, AI pipeline, evaluation engine, PDF report generation, and automated email delivery.

Most enterprise training software evaluates what you know. This platform evaluates how you communicate it — a fundamentally harder problem. The combination of VR simulation, real-time speech analysis, NLP emotion detection, and Google Cloud AutoML scoring makes it genuinely unique in the enterprise communication training space.

VR Learning Studio End-to-End Architecture — Mobile + VR
The VR conference room environment — users present to virtual audience avatars while AI analyses their speech, gaze behaviour, movement, and storytelling structure in real time. (Illustration — not actual build screenshot)

The Problem It Solved

Large enterprises employ tens of thousands of professionals whose effectiveness depends heavily on communication skills — presenting data to clients, leading team meetings, handling difficult stakeholder conversations, telling compelling data stories. Traditional training for these skills is expensive (external coaches), inconsistent (different trainers give different feedback), and unscalable at enterprise level.

The VR Learning Studio addressed all three simultaneously. AI coaching is consistent — the same evaluation criteria apply to every user regardless of location or seniority. VR delivery is scalable — any employee with a headset or mobile device can train at any time without scheduling a session. And automated PDF reporting makes the feedback detailed and immediate, without a human coach needing to review every session.

How the Experience Worked

Users entered the platform through a secure OAuth 2.0 login — implemented manually without SDK, handling the full authorization code flow, token exchange, and session management from scratch. Once authenticated, they had two training modes:

Data Storytelling Mode: The user presents in a photorealistic virtual conference room to an audience of avatars. They must speak clearly, maintain eye contact with the virtual audience, move naturally, and structure their presentation with a clear narrative — Setting, Twist, Action. The VR environment tracks everything: voice input via microphone, head rotation and gaze direction, physical movement, and interaction choices.

Story-Based Assessment Mode: Built using Dialogflow as a branching narrative engine, this mode puts users in scenario-based conversations — handling a difficult client question, managing a team conflict, responding to unexpected data challenges. Each user choice triggers a Dialogflow intent, loads the next story branch, and records the response for AI analysis.

The AI Processing Pipeline — Five Stages

Every session — whether VR or mobile — passes through the same five-stage AI pipeline:

1
Azure Speech Service — Speech to Text
Converts voice input to text in real time. Captures speech timing, clarity metrics, pauses and fluency. This text output feeds every subsequent stage of the pipeline.
2
ParallelDots NLP — Language Intelligence
Processes the transcribed text for sentiment score, emotion detected, tone analysis, and offensive content detection. Produces the NLP metrics that feed the evaluation engine.
3
Dialogflow — Conversational Logic and Story Flow
Handles intent detection and controls the branching narrative — which story branch loads next based on user choices. Also manages scenario flow for the assessment mode interactions.
4
Firebase Cloud Functions (Node.js) — Backend Orchestration
Receives all data from the app, validates and processes it, calls the AutoML API, and returns the prediction score. The serverless backend layer that connects the Unity app to Google Cloud ML.
5
Google Cloud AutoML — ML Scoring
Combines all inputs — speech, NLP, and behavioural metrics — to generate a confidence score and overall performance rating out of 100. The trained ML model that produces the final evaluation.

The Evaluation Engine — Three Dimensions

The evaluation engine combines outputs from the full AI pipeline into a structured assessment across three dimensions:

🎤 Speech Metrics
  • Words per minute (WPM)
  • Filler words count
  • Interjections
  • Pauses and fluency
🧠 NLP Insights
  • Sentiment score
  • Emotion detected
  • Tone analysis
  • Offensive score
👁️ Behavioural Metrics
  • Response time
  • Activity level
  • Movement score
  • Engagement score

Combined, these produce an overall score out of 100 with a confidence score, identified strengths, areas to improve, and actionable feedback. The score appears on the results screen inside the app at the end of each session.

Report Generation and Delivery

Every session automatically generates a detailed PDF performance report — no manual intervention required. The process:

  1. Three results screens are captured as screenshots inside Unity (Summary · Insights · Recommendations)
  2. Screenshots are combined into a multi-page PDF using iTextSharp
  3. Temporary images are deleted
  4. PDF is sent via Zoho Mail to a relay system
  5. Internal email rules engine re-sends from the official email domain
  6. User receives a professional performance report in their inbox automatically

This end-to-end automated delivery — from VR session to inbox without any human step — was one of the most technically complex parts of the system. Getting Unity to generate PDFs, pass them through a multi-step email relay, and deliver from an official corporate domain required careful orchestration across multiple services.

Full Architecture Diagram

The complete end-to-end system architecture — all six layers from authentication to delivery — is shown in the interactive diagram below:

↕ Scroll to explore all layers

VR LEARNING STUDIO

End-to-End Architecture (Mobile + VR)
Authentication Layer
OAuth 2.0
Authorization Code Flow
  • Secure Login
  • Access Token
  • Refresh Token
  • User Session
👤
User Opens App
Login / OAuth
Google / Provider
🔐
Authorization Code
🔄
Token Exchange
REST API Call (Code→Token)
🎫
Access Token Received
User Session Created
Application Layer — Supports Both Platforms
Platforms
📱
Mobile App
(Android + iOS)
🥽
VR App
(Standalone / PC VR)
Common Capabilities
  • Story-Based Assessments
  • Data Storytelling (Presentation Mode)
  • Interactive UI/UX
  • Works Online and Offline Mode
🥽 VR

A. VR Experience

[ VR Conference Room ]
🎙️
Speak
🤝
Interact
🚶
Move Naturally
👁️
Gaze Tracking
📱 Mobile

B. Mobile Experience

[ Mobile App Screen ]
🎙️
Speak
👆
Interact
📊
Telemetry
👁️
No Gaze
Captured Inputs (Both Platforms)
🎤 Voice (Mic)
👆 User Interactions (Choices / Taps)
📱 Device & App Telemetry
3
AI Processing Pipeline
Processes user inputs using AI services
  • Speech, language and intent understanding
  • Emotional & tone analysis
  • Provides insights and scores
1. Speech Recognition
Azure Speech Service
  • Speech → Text
  • Speech Timing
  • Clarity
2. NLP Analysis
🔬
ParallelDots NLP API
  • Sentiment
  • Emotion
  • Tone
  • Offensive Detection
3. Conversational Logic
💬
Dialogflow
  • Intent Detection
  • Story Flow Control
4. ML Scoring Backend
🔥
Firebase Cloud Function (Node.js)
  • Receives Data
  • Calls AutoML API
  • Returns Prediction
5. AutoML Model
☁️
AutoML (Google Cloud)
Performance Score & Confidence Evaluation
node Inside Firebase Cloud Function (Node.js)
📥
Receive Request
Validate & Process Data
Call AutoML API
📊
Get Prediction / Score
📤
Send Response to App
4
Evaluation Engine
Combines all insights from AI pipeline and generates meaningful metrics

A. Speech Metrics

🎙️
  • Speech Rate (WPM)
  • Filler Words
  • Interjections
  • Pauses & Fluency

B. NLP Insights

🧠
  • Sentiment Score
  • Emotion Detected
  • Tone Analysis
  • Offensive Score

C. Behavioral Metrics

(Other than Gaze)
  • Response Time
  • Activity Level
  • Movement (Device)
  • Engagement Score

Overall Score

82 /100
  • Confidence Score
  • Strengths
  • Areas to Improve
  • Actionable Feedback
Report Generation
Generate detailed report and deliver to user
  • Multi-page Report
  • Visual Insights
  • Actionable Feedback
🖥️
Results Screen 1
(Summary)
🖥️
Results Screen 2
(Insights)
🖥️
Results Screen 3
(Recommendations)
📸
Capture Screens
(3 Pages)
📄
Generate PDF
(Multi-page)
🗑️
Delete Temp Images
Report Delivery
Deliver report securely to user via email
  • Secure Transmission
  • Automated Delivery
  • Works for All Users
📱
App Sends PDF via Email API
(Zoho Mail)
📬
Email Sent through Zoho Mail
⚙️
Email Relay / Rule Engine
🔁
Re-send from Official Email Domain
📩
User receives #reporting
👤
User receives report (PDF)

Technologies Used

U
Unity
Azure Speech Service
🔬
ParallelDots API
💬
Dialogflow
🔥
Firebase Cloud Functions
G
Google Cloud AutoML

Data Flow Legend

Authentication Flow
Application Flow
AI Processing Flow
Backend / ML Flow
Evaluation Flow
Report Flow
Delivery Flow

Key Highlights

  • Cross-platform: Mobile + VR
  • AI multi-modal analysis
  • Serverless Firebase backend
  • Automated PDF reporting
  • Secure OAuth 2.0 auth
  • Works online and offline
Complete system architecture — all six layers from authentication to delivery. Built in Unity with Azure, ParallelDots, Dialogflow, Firebase, and Google Cloud AutoML.

What Made This Project Technically Significant

Most XR projects integrate one or two external services. This system integrated six: Azure Speech, ParallelDots NLP, Dialogflow, Firebase Cloud Functions, Google Cloud AutoML, and Zoho Mail — all working in sequence within a single evaluation flow. Each service has its own authentication, its own API contract, its own failure modes, and its own latency characteristics.

Building a system where five AI services process data sequentially and the combined output drives a real-time evaluation score — with graceful handling when any service is slow or unavailable — is a genuinely hard distributed systems problem. Doing it inside a Unity VR application, on standalone headsets with intermittent connectivity, added additional constraints around offline capability and data synchronisation.

The OAuth 2.0 manual implementation is also worth noting. Using an OAuth SDK handles the complexity for you. Implementing the full authorization code flow manually — token exchange, refresh token management, session handling — requires understanding the protocol deeply and building error handling for every edge case. That decision gave the system more control over the authentication experience and fewer third-party dependencies, at the cost of significantly more development work.

TECH STACK
🎮 Engine
Unity (VR + Mobile)
🎤 Speech AI
Azure Speech Service
🧠 NLP
ParallelDots API
💬 Conversation
Dialogflow
Backend
Firebase Cloud Functions (Node.js)
🤖 ML Scoring
Google Cloud AutoML
🔐 Auth
OAuth 2.0 (manual)
📧 Delivery
Zoho Mail + Email Relay
📄 Reports
iTextSharp PDF
📱 Mobile
Android + iOS

Related Case Studies

VR Case Study · 2024
VR Safety Training — GITEX Dubai
VR Case Study · 2025
VR Inspection — ADIPEC Abu Dhabi
AR Case Study · 2013
KFC WOW@25 AR Campaign