Invisible AI · Article 6 of 11

Chapter 3 · Entertainment AI

How Spotify Knows What You Want to Hear Before You Do

📅 May 2026 ⏱ 7 min read ✍️ Prabhu Kumar 🎵 Spotify Recommendation AI

How Spotify AI knows what you want to hear — Discover Weekly, collaborative filtering, audio analysis, NLP and the recommendation engine explained — Spotify combines three models: collaborative filtering from 600M listeners, audio analysis of the track itself, and NLP on blog posts about the artist — all to predict whether you'll skip the next song.

Every Monday, Discover Weekly drops 30 songs — and at least a few of them feel impossibly accurate. Not just "you might like this genre" accurate. "This song sounds like it was made specifically for you at 11pm when you're slightly melancholy" accurate. I've asked friends in other cities; they get the same experience. It's not luck.

Spotify has 600 million users and a library of over 100 million tracks. Its recommendation system does something genuinely difficult: it understands not just what music you like, but what music you like right now, in this context, in this mood. And it does it using three very different AI approaches running simultaneously.

600M

Monthly active users

100M+

tracks

In the library

Songs in Discover Weekly

~30%

Of listening is AI-recommended content

The three systems working in parallel
Collaborative filtering: the genius and the limitation
Natural language processing: what music blogs know
Audio analysis: what the AI hears in the song
How Discover Weekly is actually made
What signals Spotify is collecting about you
Spotify in India — a different problem
The honest bit

The Three Systems Working in Parallel

Spotify's recommendation engine is built on three fundamentally different approaches to understanding music taste. No single approach works well alone; the magic is in how they combine.

Collaborative Filtering

People like you also listen to...

The oldest and most powerful approach: find users with similar listening patterns and recommend what they're listening to that you're not. If 10,000 people who love the same indie-folk artists as you also love a specific jazz pianist, you're likely to love that pianist too — even though nobody told the algorithm they were "similar" genres.

Natural Language Processing

What the internet says about the music

Spotify crawls music blogs, reviews, Reddit threads, playlist titles, and articles — then runs NLP to extract how people describe music. "Late-night driving vibes," "sounds like Radiohead if Thom Yorke was happy," "the kind of song you play when you need to cry in the shower." These cultural descriptors become part of each track's metadata, connecting it to moods and moments beyond its audio properties.

Audio Analysis

What the AI hears in the waveform

Every track uploaded to Spotify is processed by a convolutional neural network that extracts audio features: tempo, key, mode (major/minor), danceability, energy, acousticness, instrumentalness, loudness, valence (musical "positivity"), and more. These features let Spotify recommend music to users who have no listening history — a new account can get reasonable recommendations from day one based on audio similarity alone.

Collaborative Filtering: The Genius and the Limitation

Collaborative filtering is what powers most of Spotify's heavy lifting. The concept is simple: your taste profile is the sum of your listening behaviour — which songs you played to completion, which you skipped after 10 seconds, which you added to playlists, which you heart-ed. Spotify then finds other users whose profiles are statistically similar to yours.

Here's the part most people don't realise: the groupings that emerge are not genre-based. They're taste-based. Spotify has discovered "taste communities" that cut across conventional genres — groups of people who listen to both 90s Bollywood and post-punk, or indie classical and Tamil hip-hop, because those combinations reveal something genuine about a type of listener that genre labels would never capture.

The limitation is what engineers call the "cold start" problem. A brand-new song by a completely unknown artist has no listening data. Nobody has played it, skipped it, or added it to a playlist. Collaborative filtering has nothing to work with. This is where audio analysis becomes critical — Spotify can map a new track's audio features to existing songs with similar profiles and make an educated guess about which users might like it.

Common myth

"Spotify recommends songs because you liked similar-sounding songs"

Audio similarity is just one layer. The most powerful signal is behavioural similarity across users. If 50,000 people who love Carnatic classical also listen to ambient electronic music, Spotify will recommend Aphex Twin to heavy Carnatic listeners — not because the music sounds similar, but because the audience overlaps. The recommendation is based on people, not just sound.

Natural Language Processing: What Music Blogs Know

This is the most underappreciated part of Spotify's system — and the part that explains the weirdly specific recommendations. Spotify doesn't just listen to music; it reads what the internet says about music.

The NLP system crawls blog posts, review sites, playlist descriptions, and forum threads. When a music blog writes "this album sounds like what would happen if Ennio Morricone scored a Studio Ghibli film," that description becomes part of the track's metadata — not literally, but as a vector in a high-dimensional space. Other tracks described similarly get linked, even if nobody ever directly connected the artists.

This is also how Spotify handles completely new music that has no listening data yet. If a new artist gets a review saying "fans of Phoebe Bridgers and Elliott Smith will feel at home here," the NLP system can immediately place that artist in a neighbourhood of similar artists in the taste map — before a single Spotify user has played the track.

The AI isn't just listening to the music. It's reading everything humans have written about music, extracting the cultural meaning, and using that to connect songs that pure audio analysis would never find.

Audio Analysis: What the AI Hears in the Song

Every track on Spotify is run through a deep learning model that extracts a detailed set of audio features. Some of these Spotify makes public through its API:

Valence (0.0 – 1.0)

Musical "positivity"

High valence sounds happy, cheerful, euphoric. Low valence sounds sad, depressed, angry. This is probably the single most predictive audio feature for mood-matching. Spotify's "Daily Mixes" use this heavily — your "Evening Wind Down" mix will have lower average valence than your "Morning Energy" mix.

Energy (0.0 – 1.0)

Intensity and activity level

A Death Metal track and an intense classical orchestra piece can both have high energy despite sounding nothing alike. Energy captures intensity, not genre. Workout playlists will have high energy; study playlists will have low energy regardless of genre.

Danceability (0.0 – 1.0)

Rhythm stability + beat strength + tempo regularity

How suitable is this track for dancing? Not "is it a dance song" — but do its structural elements (tempo, rhythm strength, beat regularity) support the physical act of dancing? A fast jazz piece can have low danceability; a slow trap beat can have high danceability.

Acousticness, Instrumentalness, Speechiness

Track texture signals

Acousticness: is this track primarily acoustic instruments? Instrumentalness: does it have vocals? Speechiness: does it contain a lot of spoken words (like a podcast or rap)? These three together help separate ambient instrumental study music from similar-tempo pop songs with lyrics.

How Discover Weekly Is Actually Made

Every Monday, Spotify's system builds your Discover Weekly from scratch. Here's what happens in approximate order:

Step 1: Build your taste vector. Your recent listening history (roughly the past few weeks weighted more heavily than older history) is processed into a mathematical representation of your current taste — not all-time taste, current taste. This matters: if you've been going through a jazz phase, your Discover Weekly reflects that this week, not the fact that you mostly listen to hip-hop.

Step 2: Find your taste neighbourhood. Spotify identifies users who are statistically similar to your taste vector. This isn't a small group — it might be hundreds of thousands of people who share your particular combination of listening patterns.

Step 3: Find what they listen to that you don't. From that neighbourhood, Spotify identifies tracks with high "play rate" that you haven't played, haven't actively skipped, and haven't already added to your library. The goal is genuine discovery — not recommending songs you already know you like.

Step 4: Apply audio and NLP filters. The candidate tracks from step 3 are then ranked by audio similarity and NLP cultural proximity to your listening profile. Songs that are behaviourally discovered AND audio-compatible AND culturally-described-as-similar rank highest.

Step 5: Diversity injection. Spotify deliberately injects some "wild cards" — tracks slightly outside your predicted taste zone. This prevents the playlist from becoming a closed loop. Without it, Discover Weekly would just keep recommending songs more and more similar to what you've already heard, converging to a narrow slice of music. The occasional weird recommendation is intentional.

What Signals Spotify Is Collecting About You

Spotify's listening model is built from more than just what you play. Here's what it actually tracks:

Play completion: Did you listen to 90%+ of the track, or skip after 15 seconds? A skip within 30 seconds is the strongest "I don't like this" signal in the system. Completion is the strongest positive signal — stronger than liking or saving.

Repeat plays: Playing a song more than once in a session is a very strong positive signal. Playing it more than 5 times in a week puts it on a fast track to becoming part of your core taste profile.

Time of day and context: Spotify infers context from when and how you listen. Monday morning listening has different energy characteristics than Friday night listening. If you always play high-energy music during commute hours, the algorithm learns this.

Playlist adds: Adding a track to a playlist is a deliberate, intentional act. It's weighted more heavily than passive listening. If you create a playlist called "Late Night Writing Music," Spotify can infer both what that context means and that you valued those specific tracks enough to curate them.

What you don't do: Songs you consistently skip are negative training data. Songs you hear on your Discover Weekly but never play again are weak negative signals. The system learns from omission as much as from action.

Spotify in India — A Different Problem

Spotify launched in India in 2019 and immediately faced a challenge that its global recommendation engine wasn't designed for: a music landscape where languages, regional scenes, and Bollywood's sheer dominance create recommendation problems unlike anything in Western markets.

The collaborative filtering problem in India is intense. Tamil, Telugu, Kannada, Marathi, and Bengali music scenes each have their own ecosystems, communities, and listening patterns. A Tamil listener's taste neighbourhood in the global model might be very sparse — most users who listen to Ilaiyaraaja also listen to a completely different set of English-language artists than what the global model predicts.

Spotify has invested in building regional taste models — essentially collaborative filtering sub-networks that operate within language communities rather than across the entire global user base. This means that if 10,000 Tamil music listeners in Chennai also tend to discover a particular new indie Tamil artist, that signal stays within the Tamil listener community rather than being diluted into a global model that doesn't understand the cultural context.

The NLP layer is still catching up in Indian languages. There are far fewer music blogs, reviews, and forum discussions in Tamil or Marathi than in English — so the cultural metadata that powers so much of the English-language recommendation engine is thinner for regional Indian music. Spotify has tried to address this partly through editorial curation — human-curated playlists in regional languages that train the model on cultural associations it can't yet learn automatically.

For most Indians using Spotify, the practical experience is: excellent recommendations within Bollywood and Hindi music (dense data, well-modelled), good recommendations if you blend Hindi and English listening (the global model handles this well), and more patchy recommendations if you primarily listen to regional language music (the models are improving but still maturing).

The Honest Bit

Spotify's recommendation system is genuinely impressive. It regularly surfaces music I wouldn't have found otherwise — including music in genres I wouldn't have thought to explore. That's real value.

But there's a tension worth naming. Spotify's business interests and your musical interests aren't always the same thing. Spotify earns more from streams of tracks in its own "Spotify for Artists" ecosystem, and there have been credible reports — including from the Loud & Clear transparency data — that Spotify has experimented with promoting tracks where it has more favourable licensing terms. They've denied that this systematically affects recommendations, but the incentive exists.

More concretely: the recommendation system optimises for engagement — streams, saves, session duration. Engagement isn't the same as satisfaction or musical growth. A system that keeps you in a comfortable loop of slightly-varied versions of what you already like is engaging. It is not necessarily broadening your musical world. The wild card injection in Discover Weekly is a partial solution to this — but only partial.

I use Spotify's recommendations as a starting point, not a final word. If something from Discover Weekly catches my attention, I follow that artist's Bandcamp page, look at who they cite as influences, and explore from there — outside the algorithm. The algorithm got me through the door. What I do inside is still my choice.

🎵

My experiment

What happened when I played only one genre for 30 days

Last year I went through an instrumental jazz phase — for about a month, almost everything I played was piano trio or guitar jazz. My Discover Weekly became uncanny: it found obscure Japanese jazz records from the 70s, ECM artists I'd never heard of, and one specific album from a Bangalore-based jazz collective that genuinely changed how I thought about the genre.

It also narrowed my recommendations significantly. After 30 days, my Daily Mixes had collapsed to three versions of the same thing: jazz, jazz, and more jazz. When I went back to listening to a mix of things, it took two weeks for the recommendations to loosen up again.

The lesson I took: the system is very good at depth, not so good at breadth. If you want to explore a new genre, lean into it. If you want variety, you have to actively cultivate variety — the algorithm, left to its own devices, converges.

Next in the series: Netflix's personalisation AI — why the thumbnail you see for the same show is different from what your partner sees, and how Netflix decides which 40 shows to put on your homepage out of 15,000 available.

←

What Happens in the 0.3 Seconds After You Hit Search

How Netflix Decides What You Watch — And Changes the Thumbnail to Match

→