A custom datatype called HVO_Sequence
is used for representing a drum pattern as a piano roll,
using a sequence of equally (or in some cases unequally) fixed grid lines, velocities and offsets relative to the gridlines.
When dealing with transformer architectures, commonly the input/output space of possibilities is quantized, and then tokenized so as to learn a meaningful representation space in which each of the input/output tokens is embedded. However, for this work, we purposefully decided to replace the tokenized representation of events with that of a direct representation. To this end, we used the same representation as proposed by Gillick et. al. . In this representation (from now on, called HVO, denoting hits, velocities, and offsets), the input and output sequences are directly represented by three stacked T × M matrices, where T corresponds to the number of time-steps, in this case, 32 (2 bars with 16 sub-divisions each), and M corresponds to the number of instruments, in this case, 9. The three matrices are defined as follows:
-
Hits: Binary-valued matrix that indicates the presence (1) or absence (0) of a drum hit.
-
Velocity: Continuous-valued matrix of velocity levels in the range [0, 1]
-
Offsets: Continuous-valued matrix of offset deviations from the nearest 16th note grid line, in the range [−0.5, 0.5] where ±0.5 implies mid-way between a grid line and the following/preceding gridline
This results in an HVO matrix of dimension 32 × 27. An example of an HVO representation (with 4 voices and 4 timesteps) is shown in the following image
An example of HVO derivation from the piano rollsNote All the code examples in this section are available here
The following code snippet demonstrates the basic usage of the HVO_Sequence
class. This is
a brief summary of how to use the HVO_Sequence
class to create, modify, analyze a score.
Note For a detailed guide on how to use the
HVO_Sequence
class, please refer to the documentation
from hvo_sequence import HVO_Sequence
from hvo_sequence import ROLAND_REDUCED_MAPPING
beat_div_factor = [4] # divide each quarter note in 4 divisions
hvo_seq = HVO_Sequence(beat_division_factors=beat_div_factor,
drum_mapping=ROLAND_REDUCED_MAPPING)
# ----------------------------------------------------------------
# ----------- CREATE A SCORE --------------
# ----------------------------------------------------------------
# Add two time_signatures
hvo_seq.add_time_signature(time_step=0, numerator=4, denominator=4)
# Add two tempos
hvo_seq.add_tempo(time_step=0, qpm=120)
# Create a random hvo for 32 time steps and 9 voices
hvo_seq.random(32)
# -------------------------------------------------------------------
# ----------- saving --------------
# -------------------------------------------------------------------
hvo_seq.save("demos/HVO_Sequence/misc/empty.hvo")
# -------------------------------------------------------------------
# ----------- Loading --------------
# -------------------------------------------------------------------
from pickle import load
hvo_seq_loaded = load(open("demos/HVO_Sequence/misc/empty.hvo", "rb"))
if hvo_seq_loaded == hvo_seq:
print ("Loaded sequence is equal to the saved one")
# ----------------------------------------------------------------
# ----------- Access Data --------------
# ----------------------------------------------------------------
hvo_seq.get("h") # get hits
hvo_seq.get("v") # get vel
hvo_seq.get("o") # get offsets
hvo_seq.get("vo") # get vel with offsets
hvo_seq.get("hv0") # get hv with offsets replaced as 0
hvo_seq.get("ovhhv0") # use h v o and 0 to create any tensor
# ----------------------------------------------------------------
# ----------- Plot PianoRoll --------------
# ----------------------------------------------------------------
hvo_seq.to_html_plot(
filename="test.html",
save_figure=False,
show_figure=True)
# ----------------------------------------------------------------
# ----------- Synthesize/Export --------------
# ----------------------------------------------------------------
# Export to midi
hvo_seq.save_hvo_to_midi(filename="misc/test.mid")
# Export to note_sequece
hvo_seq.to_note_sequence(midi_track_n=10)
# Synthesize to audio
audio = hvo_seq.synthesize(sr=44100, sf_path="hvo_sequence/soundfonts/Standard_Drum_Kit.sf2")
# Synthesize to audio and auto save
hvo_seq.save_audio(filename="misc/temp.wav", sr=44100,
sf_path="hvo_sequence/soundfonts/Standard_Drum_Kit.sf2")
# ----------------------------------------------------------------
# ----------- Load from Midi --------------
# ----------------------------------------------------------------
from hvo_sequence import midi_to_hvo_sequence
hvo_seq = midi_to_hvo_sequence(
filename='misc/test.mid',
drum_mapping=ROLAND_REDUCED_MAPPING,
beat_division_factors=[4])
from hvo_sequence import HVO_Sequence
from hvo_sequence import ROLAND_REDUCED_MAPPING
beat_div_factor = [4] # divide each quarter note in 4 divisions
hvo_seq = HVO_Sequence(beat_division_factors=beat_div_factor,
drum_mapping=ROLAND_REDUCED_MAPPING)
# ----------------------------------------------------------------
# ----------- CREATE A SCORE --------------
# ----------------------------------------------------------------
# Add two time_signatures
hvo_seq.add_time_signature(time_step=0, numerator=4, denominator=4)
# Add two tempos
hvo_seq.add_tempo(time_step=0, qpm=120)
# Create a random hvo for 32 time steps and 9 voices
hvo_seq.random(32)
# -------------------------------------------------------------------
# ----------- saving --------------
# -------------------------------------------------------------------
hvo_seq.save("demos/HVO_Sequence/misc/empty.hvo")
# -------------------------------------------------------------------
# ----------- Loading --------------
# -------------------------------------------------------------------
from pickle import load
hvo_seq_loaded = load(open("demos/HVO_Sequence/misc/empty.hvo", "rb"))
if hvo_seq_loaded == hvo_seq:
print ("Loaded sequence is equal to the saved one")
# ----------------------------------------------------------------
# ----------- Access Data --------------
# ----------------------------------------------------------------
hvo_seq.get("h") # get hits
hvo_seq.get("v") # get vel
hvo_seq.get("o") # get offsets
hvo_seq.get("vo") # get vel with offsets
hvo_seq.get("hv0") # get hv with offsets replaced as 0
hvo_seq.get("ovhhv0") # use h v o and 0 to create any tensor
# ----------------------------------------------------------------
# ----------- Plot PianoRoll --------------
# ----------------------------------------------------------------
hvo_seq.to_html_plot(
filename="test.html",
save_figure=False,
show_figure=True)
# ----------------------------------------------------------------
# ----------- Synthesize/Export --------------
# ----------------------------------------------------------------
# Export to midi
hvo_seq.save_hvo_to_midi(filename="demos/HVO_Sequence/misc/test.mid")
# Export to note_sequece
hvo_seq.to_note_sequence(midi_track_n=10)
# Synthesize to audio
audio = hvo_seq.synthesize(sr=44100, sf_path="hvo_sequence/soundfonts/Standard_Drum_Kit.sf2")
# Synthesize to audio and auto save
hvo_seq.save_audio(filename="demos/HVO_Sequence/misc/temp.wav", sr=44100,
sf_path="hvo_sequence/soundfonts/Standard_Drum_Kit.sf2")
# ----------------------------------------------------------------
# ----------- Load from Midi --------------
# ----------------------------------------------------------------
from hvo_sequence import midi_to_hvo_sequence
hvo_seq = midi_to_hvo_sequence(
filename='misc/test.mid',
drum_mapping=ROLAND_REDUCED_MAPPING,
beat_division_factors=[4])
checks if two hvo_sequence scores are the same use
if hvo_seq_a == hvo_seq_b:
...
Magenta's Groove MIDI Dataset (GMD), a dataset containing roughly 13.6 hours of drum performances in the format of beats and fills, classified by genre and mostly in 4/4 time signature.
Genre distribution in GMD (beats in 4-4 meter)The dataset in midi format is available using the link provided by Magenta here. The dataset can also be found as a Groove TFDS in TensorFlow Datasets (TFDS)
Note The midi files for
complete
performances provided by Magenta and also the2bar
and4bar
midi files exported from the versions provided on tfds can be found here (the script for generating these midi files is available here)
In order to avoid installing tensorflow, the groove TFDS subsets have been downloaded and stored as dictionaries of metadatas and midi files. Access these pickled dictionaries here:
1. groove_full-midionly.bz2pickle
2. groove_2bar-midionly.bz2pickle
3. groove_4bar-midionly.bz2pickle
These files are simply dictionaries of the following format
{
'train', 'test' or 'validation':
{
'drummer': ...,
'session': ...,
'loop_id': ...,
'master_id': ...,
'style_primary': ...,
'style_secondary': ...,
'bpm': ...,
'beat_type': ...,
'time_signature': ...,
'full_midi_filename': ...,
'full_audio_filename': ...,
'midi': bytes,
'note_sequence': ...,
}
}
Note this dataset is quite unbalanced, with a lot of songs in the
rock
genre. Below are a number of plots showing the distribution of the dataset.
Some Examples of the visualizations are below. We highly recommend using the html bokeh
figures available
here.
These plots can be generated using the script here
Many more analysis of the dataset can be found here
Note All the code examples in this section are available here
from data import load_original_gmd_dataset_pickle
# Load 2bar gmd dataset as a dictionary
gmd_dict = load_original_gmd_dataset_pickle(
gmd_pickle_path="data/gmd/resources/storedDicts/groove_2bar-midionly.bz2pickle")
gmd_dict.keys()
# dict_keys(['train', 'test', 'validation'])
gmd_dict['train'].keys()
#dict_keys(['drummer', 'session', 'loop_id', 'master_id', 'style_primary', 'style_secondary',
'bpm', 'beat_type', 'time_signature', 'full_midi_filename', 'full_audio_filename',
'midi', 'note_sequence'])
Note All the code examples in this section are available here
# Extract HVO_Sequences from the dictionaries
from data import extract_hvo_sequences_dict, get_drum_mapping_using_label
hvo_dict = extract_hvo_sequences_dict (
gmd_dict=gmd_dict,
beat_division_factor=[4],
drum_mapping=get_drum_mapping_using_label("ROLAND_REDUCED_MAPPING"))
The resulting hvo_dict
is a dictionary of the following format
{
'train':
list of HVO_Seqs corresponding to gmd_dict['train']['midi'][idx]
'test':
list of HVO_Seqs corresponding to gmd_dict['test']['midi'][idx]
'validation':
list of HVO_Seqs corresponding to gmd_dict['validation']['midi'][idx]
}
Note All the code examples in this section are available here
from data import load_gmd_hvo_sequences
train_set = load_gmd_hvo_sequences(
dataset_setting_json_path = "data/dataset_json_settings/4_4_Beats_gmd.json",
subset_tag = "train",
force_regenerate=False)
The loader requires a specific json file to be passed as an argument. The content of the json file is as follows.
{
"global":
{
"beat_division_factor": [4],
"drum_mapping_label": "ROLAND_REDUCED_MAPPING"
},
"raw_data_pickle_path":
{
"gmd": "data/gmd/resources/storedDicts/groove_2bar-midionly.bz2pickle"
},
"settings":
{
"gmd":
{
"drummer": null,
"session": null,
"loop_id": null,
"master_id": null,
"style_primary": null,
"style_secondary": null,
"bpm": null,
"beat_type": ["beat"],
"time_signature": ["4-4"],
"full_midi_filename": null,
"full_audio_filename": null,
"number_of_instruments": null
}
}
}
Warning It is highly recommended to place your custom dataset setting json files in the
data/dataset_json_settings
folder.
# Load dataset as torch.utils.data.Dataset
from data import MonotonicGrooveDataset
# load dataset as torch.utils.data.Dataset
training_dataset = MonotonicGrooveDataset(
dataset_setting_json_path="data/dataset_json_settings/4_4_Beats_gmd.json",
subset_tag="train",
max_len=32,
tapped_voice_idx=2,
collapse_tapped_sequence=False,
load_as_tensor=True)
Note To batchify the dataset, wrap the
MonotonicGrooveDataset
intotorch.utils.data.DataLoader
iterator.from torch.utils.data import DataLoader train_dataloader = DataLoader(training_dataset, batch_size=32, shuffle=True) epochs = 10 for epoch in range(epochs): # in each epoch we iterate over the entire dataset for batch_count, (inputs, outputs, indices) in enumerate(train_dataloader): print(f"Epoch {epoch} - Batch #{batch_count} - inputs.shape {inputs.shape} - " f"outputs.shape {outputs.shape} - indices.shape {indices.shape} ")