AI Customer Support vs Human Agent

The Brief

The exact conditions

Ticket set

10 real tickets from a SaaS product (anonymised). Mix: 3 simple how-to questions, 2 return/refund requests, 2 billing disputes, 2 technical troubleshooting issues, 1 emotionally charged complaint about a data loss incident.

AI setup

ChatGPT-4o with a custom system prompt defining the company persona, refund policy, and product knowledge base. Plus Intercom's Fin AI Copilot surfacing knowledge base articles. Both given identical knowledge access.

Human agent

Asha, 4-year support veteran, SaaS background. Working from the same knowledge base. No AI assist allowed for this test — pure human response.

Scoring panel

5 evaluators rated each response blind (no labels) on a 1–5 CSAT scale. First-contact resolution recorded. Escalation flagged where the response required follow-up or a human override.

Success metrics

CSAT (1–5), First-Contact Resolution rate, response time, escalation rate, and tone/empathy score (evaluator-rated 1–10).

🎫 The 10 Tickets — Summary

#	Ticket type	🤖 AI CSAT	🧑 Human CSAT	Winner
T01	How do I export my data? Simple how-to	4.8	4.6	AI
T02	Why was I charged twice? Billing duplicate	3.2	4.9	Human
T03	App crashes on iOS 18.3 Tech troubleshooting	3.5	4.7	Human
T04	Can I get a refund? Policy question, in-window	4.9	4.7	AI
T05	How do I add team members? Feature how-to	4.9	4.8	AI
T06	I lost 3 months of data. I am furious. Emotional complaint	1.6	5.0	Human
T07	Upgrade didn't apply, still on Free plan Account issue	3.8	4.8	Human
T08	I want to cancel my subscription Churn risk	3.4	4.5	Human
T09	What integrations do you support? Simple info	5.0	4.6	AI
T10	API rate limits — unclear docs Technical + frustrated	3.1	4.6	Human

🤖 The AI Performance

AI dominated the informational tickets. T01, T04, T05, and T09 were handled with near-perfect CSAT scores — responses were instant (under 3 seconds), accurate, clearly formatted, and complete. For policy-based answers (refund window, integrations list, team member limits), AI recalled the exact policy text from the knowledge base every time without hesitation. No misquoting, no "let me check on that."

Where AI fell apart: anything requiring emotional intelligence or multi-turn diagnosis. T06 — the data loss complaint — was catastrophic. The AI opened with "I'm sorry to hear about this!" and immediately listed three troubleshooting steps. The evaluator panel gave it 1.6/5. One reviewer wrote: "This person lost 3 months of work. You gave them a checkbox list. This response would make me switch products." The human response started with two full paragraphs of acknowledgment before mentioning any resolution path.

🤖 AI — ChatGPT + Intercom Fin

Response time: 2–8 seconds per ticket

FCR rate: 7/10 (70%) — 3 tickets needed follow-up or escalation

Mean CSAT: 3.82/5

Escalation rate: 40% of complex tickets flagged for human review

Empathy score: 4.1/10 — panel consistently noted robotic phrasing

✓ Instant on simple tickets ✓ Policy recall perfect ✗ Emotional failure ⚠ Flat tone throughout

🧑 Human — Asha (4-year SaaS support)

Response time: 4–18 minutes per ticket

FCR rate: 9/10 (90%) — only 1 ticket required follow-up

Mean CSAT: 4.72/5

Escalation rate: 10% — handled most complex cases herself

Empathy score: 8.9/10 — panel praised natural, personalised responses

✓ Emotional intelligence ✓ Contextual diagnosis ✓ Churn prevention instinct ⚠ Slower on simple tickets

📌 The T06 gap — what AI genuinely cannot do yet

AI's response to the data loss ticket: "I'm sorry to hear you're experiencing this! Here are the steps to check your data recovery options: 1. Go to Settings → Backup... 2. Check the restore point..." Human Asha's response opened with: "I just read your message twice. Losing three months of work is not a minor inconvenience — it's devastating, and I want to be completely honest with you about what I know happened and what we're doing about it." The CSAT gap was 3.4 points on this single ticket. Asha's response triggered a follow-up email from the customer saying they would stay. The AI response would have triggered a chargeback.

📊 The Scorecard

Battle 06 · Customer Support Scorecard

10 real SaaS support tickets · ChatGPT + Intercom Fin vs 4-year support agent · Scored 1–10

🤖 AI

🧑 Human

Winner

Speed

2–8s AI vs 4–18min human

CSAT (Mean)

3.82 AI vs 4.72 human

Human

Empathy & Tone

Emotional intelligence, personalisation

Human

FCR Rate

First-contact resolution

Human

Policy Recall

Accuracy of info, no hallucinations

Total

Out of 50

36/50

38/50

Human

🏆 Verdict

🏆 Verdict — Battle 06 · Customer Support

Human wins — but the tier split is the real takeaway

Human wins overall 38/50 vs 36/50, but the numbers hide the actual story. On simple informational tickets, AI is better — faster and just as accurate. On complex, emotionally loaded, or ambiguous tickets, the gap is enormous. The data loss ticket alone represents what matters most in support: the moment a customer decides whether to stay or leave. AI catastrophically failed that moment.

The business decision should not be "AI or human support." It should be "which tickets should AI handle, and which should route directly to a human?" The Tier 1 / Tier 2 model has existed in enterprise support for 20 years — AI is the best Tier 1 agent ever built. But it should not be answering T06.

🔀 The Hybrid Workflow

⚡ The support model that wins on both CSAT and cost

Tier-based routing, not replacement

AI handles Tier 1 (simple, policy, info): How-to questions, refund status checks, feature FAQs, integration lists, account lookups. AI handles these in under 8 seconds with ≥4.8 CSAT. Human time freed up entirely for complex cases.

Sentiment detection triggers human routing: Train the AI to flag tickets containing emotional signals — "furious", "lost", "completely unacceptable", "cancel immediately" — and route them directly to a human with AI-generated context summary pre-loaded. No AI response sent.

AI assists the human on Tier 2 tickets: For billing disputes and technical issues, AI can surface the relevant knowledge base sections, summarise the account history, and draft a response for the human to edit. Cuts human response time by ~35% without removing the human from the response.

Estimated outcome: 60–70% of volume handled by AI at Tier 1 (freeing human for complex tickets). Tier 2 human response time reduced ~35%. CSAT across the board would likely land at 4.5+/5.

🤖 Use AI when…

Simple how-to and feature questions
Policy lookups (refund windows, pricing tiers, limits)
Order/account status checks (within clear rules)
High-volume repetitive tickets with low emotional stake
Drafting suggested responses for a human to review

🧑 Use a human when…

Emotional distress is detectable in the ticket
Data loss, billing errors, or account compromise involved
Churn risk — customer is threatening to cancel
Multi-turn troubleshooting requiring diagnosis
Any ticket where the outcome is "stay or leave"

AI vs Human Workflow — all battles

The full series

01 · Logo Design Live ✓ 02 · Instagram Captions Live ✓ 03 · Unity Coding Live ✓ 04 · Video Editing Live ✓ 05 · Copywriting Live ✓ 06 · Customer Support ← You are here 07 · Music Live ✓ 08 · Research Live ✓ ← Back to series hub

AI Customer Supportvs Human Agent— 10 Real Tickets

The Brief

🎫 The 10 Tickets — Summary

🤖 The AI Performance

📊 The Scorecard

🏆 Verdict

🔀 The Hybrid Workflow

Tier-based routing, not replacement

AI Customer Support
vs Human Agent
— 10 Real Tickets