When listening to music, our attention is drawn back and forth between different elements. Often this is guided by following the main, most prominent melodic line: the Hauptstimme. This repo is about that effect, providing:
- a corpus of orchestral scores, with human analysis annotations for where they think the "main theme" is,
- code for processing this - e.g., for creating a summative "melody score".
Please see this explanation for more details on the annotation method and FAQs.
Apart from the annotations, the scores themselves are introduced here for the first time. As such, we also explain the stylistic design criteria here (Note: This content may move to the OpenScore GitHub).
The full, core corpus consists of over 100 movements:
- Bach, JS:
- B Minor Mass,
- 27 movements (ish ... depending on how you count it).
- the movements are numbered according to NBAII (1–23) and are split by movement where possible (e.g., 7a from 7b), but not in the case of dovetail (e.g., 4a and 4b are one with double bar line and editorial tempo marking).
- Brandenburg Concerto No.3 (BWV 1048),
- 3 movements.
- Brandenburg Concerto No.4 (BWV 1049),
- 3 movements.
- B Minor Mass,
- Beach, Amy:
- 1 symphony, the 'Gaelic',
- 4 movements.
- 1 symphony, the 'Gaelic',
- Beethoven
- 9 symphonies,
- 37 movements.
- 9 symphonies,
- Brahms, Johannes:
- Ein Deutsches Requiem,
- 1 movement, the 1st.
- 4 symphonies,
- 16 movements.
- Ein Deutsches Requiem,
- Bruckner, Anton:
- 1 symphony, the 5th,
- 4 movements.
- 1 symphony, the 5th,
All of these cases include the files in the format <identifier>
plus:
.mscz
: The annotated MuseScore file. Edit this file..mxl
: A conversion of the.mscz
file._annotations.csv
: Information about each annotation including the qstamp, theme label, and instrument._melody.mxl
: The annotated melody segments stitched together to form a single-stave 'melody score'..mm.json
: The compressed 'measure map' - a lightweight representation of the bar information to enable alignment with other corpora..csv
: A 'lightweight' .csv file extracted from the full score (with repeats expanded), indicating the highest pitch being played by each instrument part at every timestamp in which a change occurs in the score._part_relations.csv
: A derived analysis of the interplay between the score parts in each Hauptstimme annotation block._alignment.csv
: An alignment table containing timestamps for each score note onset in a set of public domain / open license audio recordings obtained from IMSLP. (These files only exist for scores where such recordings are available.)
The filename structure is as follows:
<composer>/<symphony>/<movement>/<files>
We provide the hauptstimme
package as well as the following code:
main.py
: Take a score's MuseScore file and produces the rest of the files specified above (except the alignment table).build_corpus.py
: Produce all corpus files from each score's MuseScore file.get_part_relations.py
: Take a score's MusicXML file and produce a part relationships summary.compare_segmentations.py
: Take a score's MusicXML file and perform a comparison of the Hauptstimme annotation points to a three different sets of automatic segmentation points (novelty-based (tempogram features), novelty-based (chromagram features), and changepoint detection-based).align_score_audios.py
: Take a score's MuseScore/MusicXML file and a set of audio files, then align the audio files to the score, producing an alignment table.demo.ipynb
: A demonstration of how functions in the haupstimme package can be used.
Possible future TODOs:
- Renumbering scores based on measure maps.
- Analysis of hypotheses such as 'loud dynamic and lots of unison = camera on whole orchestra'.
- Distance between melodic blocks and clustering.
- Extracting all instances of a particular theme.
- Allowing multiple annotations at the same timestamp (currently causes issues with the part relationship summaries, but would be useful in the case of polyphony).
- Produce instrument range data from our corpus (see
range_data_adler_chon_huron_devlieger.csv
)
Development was done in Python 3.11.
Many thanks to:
- Deutsche Telekom for funding part of this work in the context of the 'Beethoven X' project.
- Fellow 'Beethoven X' project team members for discussions.
- Annotators:
- On the 'Beethoven X' project, including Nicolai Böhlefeld and many others.
- At Cornell, Eastman, TU Dortmund, Durham, and elsewhere.
- Transcribers, both:
- in our immediate team, and
- more widely across the MuseScore community, members who made transcriptions freely available under the CCO licence and named their source edition.
- Scores: CC0 1.0 Universal
- Annotations: CC-By-SA
- Code: CC-By-SA
All scores have been copied from clearly identified and unequivocally public source editions on IMSLP. Transcribers have committed to making these transcriptions using that public source edition, and working from scratch. We have confidence in our team and their work but obviously cannot make any guarantees. If you see anything that we ought to review, please let us know.
To follow ;). Provisionally Martins et al. 2024.