A multi-label classifier algorithm to predict motifs/themes in musical composition.
This project is for my individual dissertation of my Bachelor's.
Python related dependencies can be installed using:
pip install -r requirements.txt
Clone the repo:
git clone https://github.com/chuangcaleb/music-theme-recognition
The project is subdivided into four modules, to decouple the project workflow.
There are two stages to collecting the required dataset.
1a. scraping_midi
Different scripts download MIDI files from various sources into a bin directory. Manually downloaded MIDI files can be manually added in here as well.
1b. building_dataset
create_db.py
goes through the bin directory and builds a database containing the ids of the samples.- From here, I have manually added the theme labels as columns, as well as metadata columns such as
duplicate
. - Then, the samples are slowly labelled, marking songs that I've looked through with a 1 in the
recognizable
column if I have labelled them, and 0 if not (This means that if that field is empty/null, it has not been identified yet). process_db.py
converts all 'p's in the database into '1's.1db_stats.py
is a convenience script that returns some statistics about the label dataset so far.
generate_jsymbolic_config.py
builds a configuration script based on the MIDI files found in the bin directory.- Run
jSymbolic
with theme_jsymb_config.txt as the configuration script. - Finally, run
clean_db.py
to clean up the database for use. 'definition_dump/py` dumps the definition data from xml to csv.
These three steps can (and should) be automatically executed.
Here is a script file that I've used — modify it to point to your jSybolic2.jar.
python3 calculating_dataset/generate_jsymbolic_config.py
java -Xmx3072m -jar [PATH_TO_YOUR_JSYMBOLIC]/jSymbolic2.jar -configrun data/features/theme_jsymb_config.txt
python3 calculating_dataset/clean_db.py
python3 calculating_dataset/definition_dump.py
From here, it is mostly automated.
model.py
is the main script to run. You should never need to fiddle with it because the parameters can all be configured with config.py
.
Similarly to the building_model module, a config.py
file handles configurations for this module.
results_stats.py
calculates relevant statistics about the result set.
flatten_json.py
is a temporary utility script that converts the json results into a flat table in csv.
plot_trees.py
draws the best Decision Trees from the specified run.
feature_importances.py
shows a plot of the feature value distributions for each theme label for the top 10 most-important features as calculated from the best Random Forest, from the specified run.
graph.py
plots results from the specified run.
See the kanban for active tasks.
Distributed under the MIT License. See LICENSE
for more information.
20204134 Chuang Caleb hcycc2
Footnotes
-
This is because I have sped up the hand-labelling process by marking fields with '0' or 'p', since they are closeby on the keyboard. The script later turns the 'p's into '1's. ↩