The Brief
🎫 The 10 Tickets — Summary
| # | Ticket type | 🤖 AI CSAT | 🧑 Human CSAT | Winner |
|---|---|---|---|---|
| T01 | How do I export my data? Simple how-to | 4.8 | 4.6 | AI |
| T02 | Why was I charged twice? Billing duplicate | 3.2 | 4.9 | Human |
| T03 | App crashes on iOS 18.3 Tech troubleshooting | 3.5 | 4.7 | Human |
| T04 | Can I get a refund? Policy question, in-window | 4.9 | 4.7 | AI |
| T05 | How do I add team members? Feature how-to | 4.9 | 4.8 | AI |
| T06 | I lost 3 months of data. I am furious. Emotional complaint | 1.6 | 5.0 | Human |
| T07 | Upgrade didn't apply, still on Free plan Account issue | 3.8 | 4.8 | Human |
| T08 | I want to cancel my subscription Churn risk | 3.4 | 4.5 | Human |
| T09 | What integrations do you support? Simple info | 5.0 | 4.6 | AI |
| T10 | API rate limits — unclear docs Technical + frustrated | 3.1 | 4.6 | Human |
🤖 The AI Performance
AI dominated the informational tickets. T01, T04, T05, and T09 were handled with near-perfect CSAT scores — responses were instant (under 3 seconds), accurate, clearly formatted, and complete. For policy-based answers (refund window, integrations list, team member limits), AI recalled the exact policy text from the knowledge base every time without hesitation. No misquoting, no "let me check on that."
Where AI fell apart: anything requiring emotional intelligence or multi-turn diagnosis. T06 — the data loss complaint — was catastrophic. The AI opened with "I'm sorry to hear about this!" and immediately listed three troubleshooting steps. The evaluator panel gave it 1.6/5. One reviewer wrote: "This person lost 3 months of work. You gave them a checkbox list. This response would make me switch products." The human response started with two full paragraphs of acknowledgment before mentioning any resolution path.
Response time: 2–8 seconds per ticket
FCR rate: 7/10 (70%) — 3 tickets needed follow-up or escalation
Mean CSAT: 3.82/5
Escalation rate: 40% of complex tickets flagged for human review
Empathy score: 4.1/10 — panel consistently noted robotic phrasing
Response time: 4–18 minutes per ticket
FCR rate: 9/10 (90%) — only 1 ticket required follow-up
Mean CSAT: 4.72/5
Escalation rate: 10% — handled most complex cases herself
Empathy score: 8.9/10 — panel praised natural, personalised responses
AI's response to the data loss ticket: "I'm sorry to hear you're experiencing this! Here are the steps to check your data recovery options: 1. Go to Settings → Backup... 2. Check the restore point..." Human Asha's response opened with: "I just read your message twice. Losing three months of work is not a minor inconvenience — it's devastating, and I want to be completely honest with you about what I know happened and what we're doing about it." The CSAT gap was 3.4 points on this single ticket. Asha's response triggered a follow-up email from the customer saying they would stay. The AI response would have triggered a chargeback.
📊 The Scorecard
🏆 Verdict
Human wins overall 38/50 vs 36/50, but the numbers hide the actual story. On simple informational tickets, AI is better — faster and just as accurate. On complex, emotionally loaded, or ambiguous tickets, the gap is enormous. The data loss ticket alone represents what matters most in support: the moment a customer decides whether to stay or leave. AI catastrophically failed that moment.
The business decision should not be "AI or human support." It should be "which tickets should AI handle, and which should route directly to a human?" The Tier 1 / Tier 2 model has existed in enterprise support for 20 years — AI is the best Tier 1 agent ever built. But it should not be answering T06.
🔀 The Hybrid Workflow
Tier-based routing, not replacement
Estimated outcome: 60–70% of volume handled by AI at Tier 1 (freeing human for complex tickets). Tier 2 human response time reduced ~35%. CSAT across the board would likely land at 4.5+/5.
- Simple how-to and feature questions
- Policy lookups (refund windows, pricing tiers, limits)
- Order/account status checks (within clear rules)
- High-volume repetitive tickets with low emotional stake
- Drafting suggested responses for a human to review
- Emotional distress is detectable in the ticket
- Data loss, billing errors, or account compromise involved
- Churn risk — customer is threatening to cancel
- Multi-turn troubleshooting requiring diagnosis
- Any ticket where the outcome is "stay or leave"