Using machine learning to find songs that are identical to Steven Wilson's style.
I'm going straight to the point: I'm obsessed with Steven Wilson. I can't help it, I love his music. And I need more music with similar (almost identical) style. So, what I'm trying to solve here is, how to find songs that match SW's style with almost zero error?
I'm aware that Spotify gives you recommendations, like similar artists and such. But that's not enough -- Spotify always gives you varied music. Progressive rock is a very broad genre, and I just want those songs that sound very, very similar to Steven Wilson or Porcupine Tree.
BTW, Porcupine Tree was Steven Wilson's band, and they both sound practically the same. I made an analysis where I checked their musical similarities.
I'm using the Spotify web API to get the data. They have an amazingly rich amount of information, especially the audio features.
This repository has 5 datasets:
StevenWilson.csv
: contains Steven Wilson discography (65 songs)PorcupineTree.csv
: 65 Porcupine Tree songsComplete Steven Wilson.csv
: a merge between the past two datasets (Steven Wilson + Porcupine Tree)Train.csv
: 200 songs used to train KNN. 100 are Steven Wilson songs and the rest are totally different songsTest.csv
: 100 songs that may or may not be like Steven Wilson's. I picked this songs from various prog rock playlists and my Discover Weekly from Spotify.
Also, so far I've made two kernels:
- Comparing Steven Wilson and Porcupine Tree
- Finding songs that match SW's style using K-Nearest Neighbors
There are 21 columns in the datasets.
Numerical: this columns were scraped using get_audio_features from the Spotify API.
acousticness
: a confidence measure from 0.0 to 1.0 of whether the track is acoustic; 1.0 represents high confidence the track is acousticdanceability
: it describes how suitable a track is for dancing; a value of 0.0 is least danceable and 1.0 is most danceableduration_ms
: the duration of the track in millisecondsenergy
: a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activityinstrumentalness
: predicts whether a track contains no vocals; values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0liveness
: detects the presence of an audience in the recording; 1.0 represents high confidence that the track was performed liveloudness
: the overall loudness of a track in decibels (dB)speechiness
: detects the presence of spoken words in a track; the more exclusively speech-like the recording (e.g. talk show), the closer to 1.0 the attribute valuetempo
: the overall estimated tempo of a track in beats per minute (BPM)valence
: a measure from 0.0 to 1.0 describing the musical positiveness; tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry)
Categorical: these features are categories represented as numbers.
key
: the musical key the track is in. e.g. 0 = C, 1 = C♯/Dâ™, 2 = D, and so onmode
: mode indicates the modality (major or minor); major is represented by 1 and minor is 0time_signature
: an estimated overall time signature of a track; it is a notational convention to specify how many beats are in each bar (or measure). e.g. 4/4, 4/3, 3/4, 8/4 etc.
Strings: these fields are mostly useless (except for name, album, artist and lyrics)
id
: the Spotify ID of the songname
: name of the songalbum
: album of the songartist
: artist of the songuri
: the Spotify URI of the songtype
: the type of the Spotify objecttrack_href
: the Spotify API link of the songanalysis_url
: the URL used for getting the audio featureslyrics
: lyrics of the song in lower case
I made a Kaggle repository. The datasets are there and you can create an IPython/R notebook easily.
Ever been obsessed with a song? an album? an artist? I'm planning on building a web app that solves this. It will help you find music extremely similar to other.