Skip to content

Machine Learning Generative Classical Music using RNN LSTMs with MIDI music dataset and Magenta Tensorflow

License

Notifications You must be signed in to change notification settings

lucylow/Stochastic_SoundCloud

Repository files navigation

Stochastic SoundCloud : Lucy’s New Mozart Mixtape 🔥

Machine Learning Generative Music using RNN LSTMs.

Status GitHub Issues GitHub Pull Requests License

CATS ARE SO CUTE Album Cover. LUCY's New_Mozart_Mixtape now available on Stochastic_SoundCloud for the LOW price of $3.99. Featuring new up coming rapper TensorFlow_AI


Motivation

  • Calculations for stocastic music take a long time when done by hand. Stochastic SoundCloud uses machine learning to make it easier to generate melodies while reviewing basic math concepts like law of large numbers, probability theory, game theory, boolean algebra, markov chains, poisson law, and group theory
  • “If I were not a physicist, I would probably be a musician. I often think in music. I live my daydreams in music. I see my life in terms of music.” – Albert Einstein
  • In 1958 Iannis Xenakis used Markov Chains, a stochastic process to make predictions on the future based on its present state. He composed "Analogique" (pictured below) - The first musical composition that models the probability of a note occuring after a sequence of notes:

Stocastic Process

  • Probability theory where a math object is defined with random variables

  • "Stochastic" == "an asymptotic evolution towards a stable state, towards a kind of goal, of stochos"

  • Pragmatic examples:

    • Bernoulli process to study the repeatedly flipping of a coin where the probability of obtaining a head is p value is one and value of a tail is zero
    • Weiner Brownian motion process to study the diffusion of tiny particles suspended in fluid (also used as a solution to the Schrödinger equation)
    • Poisson process to study the number of phone calls occurring in a certain period of time

Stochastic SoundCloud : Music is Sequential

  • Music is older than language - automatic music became "algorithmic" where piano compositions can be broken down into fragments
  • The determined musical state is only partially determined by the preceding musical state where the concrete musical state n+2 follows after the state n+1 only with some probability
  • Changing the pitch, duration, timbre, dynamics, and amplitudes of music waveforms using parameters changing the effects to the spectral domain

Theory Classical Music Concepts

In order to generate classical music for Lucy’s New Mozart Mixtape, we need to understand more about how a computer interprets music notes. Reading sheet music is like learning a new language where the symbols represent pitch, speed, and rhythm of the melody. It is a sequential sucession of musical notes read in linear order. How would you abstract a musical melody into numerical data that can be trained with a neural network?

Piano scales

A scale is made of eight consecutive notes. The C major scale is composed of C, D, E, F, G, A, B, C. This is important because we can transpose all of the musical pieces to key C, while maintaining the relative relationship between notes. The generated pieces can be transposed to any key.

Piano Roll Representation

  • Piano roll representation is a music storing data type where a music piece us represented by a score-like binary valued (0 XOR 1) matrix representing music notes over different time steps. Let M == multi-track music piece with a set of piano rolls representing music pieces.

  • M-track piano roll representation:

    • M-track musical piece will be converted into a set of M piano rolls
    • One bar is represented as a tensor x ∈ {0, 1} R×S×M where R == time steps in a bar and S == the number of note candidates
    • T bars is represented as x_hat = {−x_hat(t)} from t =1 to t = T
  • Piano roll of each bar, each track, for the real and the generated data is represented as a fixed-size matrix. For example the piano roll for a bar in 4/4 time with one track can be represented mathematically as a 96 x 128 matrix for M tracks. Converting this, a bar in 4/4 time with M tracks can be represented as a 96 x 128 x M tensor.

    The green bars next to the piano represents the piano roll of the score sheet


Theory Musical Instrument Digital Interface (MIDI) Representation

Musical Instrument Digital Interface (MIDI) maps musical note names to numbers making it easier for engineers to play, edit and record music. An example would be C4 key on piano == "60" MIDI. The data is then fed into the neural network as piano roll representation where:

  • X axis = Time sequence
    • Absolute timing where we use the actual timing of each note occurrence
    • Symbolic timing where the tempo data is removed as a normaliziation factor such that each beat has the same length
  • Y axis = Notes on a piano keyboard (pitch or velocities of the notes)

Example of piano C scale with ten notes C4, D4, E4, E4, F4, D4, G4, E4, D4, and C4 with corresponding MIDI numbers 60, 62, 64, 64, 65, 62, 67, 64, 62, and 60.

One Hot Encoding of MIDI numbers

How do the MIDI numbers fit as an input in our RNN-LSTM neural network? One Hot Encoding. One Hot Vectors are a categorical binary representation of data where each row has one feature with a value of 1 (music note is on) and the other features with value 0 (music note of off).

Example of one hot encoding:

  • MIDI file #1 [Note 1, Note 2, Note3] ==> {[1,0,0], [0,1,0], [0,0,1] } One Hot Encoding
  • MIDI file #2 [Note 1, Note 2, Note3] ==> {[1,0,0], [0,1,0], [0,0,1]} One Hot Encoding

Each song is an ordered list of pseudo-notes where the final vector will have dimensions where the Number of samples (nb) x Length of sequence (timesteps) x One-Hot Encoding of pseudo-notes. The melody at each timestep gets transformed into a 38-dimensional one-hot vector. There are 38 total kinds of events with 36 note-on events, 1 note-off event, and 1 no-event:

Matrix of one-hot encoded MIDI data for first four piano bars. The default setting is 96 beats per beat but we set it to 4 ticks/ beat or resolution of 1/16th note per time step. Each row represents the quantization of the time dimension.


Theory Hierarchical RNN LSTM Architecture

Why choose a RNN LSTM for Stochastic SoundCloud music generation?

  • Music is an art of time with a temporal structure and has hierarchical structure with higher-level building blocks (phrases) made up of smaller recurrent patterns (bars)
  • Recurrent Neural Networks are able to capture time dependencies between inputs.
  • Mozer Eck in 2002 found that for RNN composed music composed, the “local contours made sense but the pieces were not musically coherent” and suggested to use long short-term memory Recurrent Neural Networks (RNN LSTM) and avoids the rapid decay of "backpropagated error"
  • Audio classication features include extracting audio features and representations, training parameter tune and evaluating classifiers of audio segments, performing supervised and unsupervised audio segmentation (ex. joint segmentation classification, emotion recognition, or speaker diarization), or the ability to apply dimensionality reduction to visualize audio data and content similarities

Stochastic SoundCloud Machine Learning and Architecture

Train the RNN LSTM Recurrent Neural Network to compose a melody. Lookback and Attention RNNs are proposed to tackle the problem of creating melody’s long-term structure. It needs to be fed with a chord sequence and will then output a Prediction Matrix, which can be transformed into a piano roll matrix and finally into a melody MIDI file. The number of samples is given by the difference between the number of timesteps of the piano roll matrix and the sequence length: number of samples = number of timestepspiano roll − sequence length:

  • Input 3-dimensional Matrix of size: Number of samples, timesteps, input dimension
  • Target 2-dimensional Matrix of size: Number of samples, output dimension

Input to Target Matrix:

  • Network Input Matrix: One network input sample consists of a 2-dimensional input matrix.
  • Prediction Target Matrix: One target sample consists of a 1-dimensional target vector

The RNN LSTM consists of an input layer, an output layer, and optionally hidden layers between the input and output layer. The chord sequences need to be within one octave and the belonging melodies within two octaves


Technical Music Dataset


Technical Tools

  • Python 3 (>= 3.5)
    • MIDI libraries for Python
  • Magenta for Tensorflow with the 3 pre-trained RNN LSTM models:
    1. Basic RNN (basic one hot encoding)
    2. Lookback RNN
    3. Attention RNN (looks at bunch of previous steps)
  • GarageBand for Mac

Technical Installation

Use Anaconda python packages:

curl https://raw.githubusercontent.com/tensorflow/magenta/master/magenta/tools/magenta-install.sh > /tmp/magenta-install.sh bash /tmp/magenta-install.sh

Run Magenta using Python programs or Juypter Notebook

source activate magenta

Clone this repository

git clone https://github.com/lucylow/Stochastic_SoundCloud.git

Install the dependencies

pip install -e .

Run the melody_rnn_generate script from the base directory

python Stochastic_SoundCloud/melody_rnn/melody_rnn_generate --config=...


Technical RNN LSTM Machine Learning Model parameters

Stochastic SoundCloud RNN LSTM networks used in experiments had two hidden layers and each hidden layer had 256 hidden neurons with initial learning rate of 0.001. The minibatch size was 64 and to avoid over-fitting the dropout rate was set to a ratio of 0.5:

  • Config = (choose options between basic_rnn, mono_rnn, lookback_rnn, attention_rnn)
  • Bundle_file = (choose .mag file)
  • Output_dir = output directory within Stochastic_Soundloud folder
  • Num_outputs = 10 (number of music files you want to output)
  • Num_steps = 128
  • Primer_melody = 60 (middle C on piano)

Lucy’s New Mozart Mixtape Results

Basic RNN

  • Basic two-three notes with note by note basis (monotonic)

  • One-hot encoding == melody

  • Pitch range [48 to 84]

Lookback RNN

  • Patterns that occur over one or two measures/bars in a song resulting in more "repetitive" beats
  • Less basic (Allows custom inputs and labels) than 1) and more musical structure with actual melodies since Lookback feature that makes the model repeat sequences easier
  • Lookback RNN outperformed the Attention RNN

Attention RNN

  • Looks at bunch of previous steps to figure out what note to play next (more longer term dependencies)

  • More mathematically complicated

  • Notes more complex (polytonic)


What's next for Lucy’s New Mozart Mixtape?

  • Preprocess the MIDI files - dataset was quite noisy (remove super high/low notes by discarding notes below C1 or above C8, decrease the ratio of empty bars)
  • Apply the Music Turing Test to compare outputs to human generated music. Can the discriminator tell if the generated music is real or fake?
  • Compare quantitative results of the three models usng binarization testing stratgies: Bernoulli sampling (BS) or hard thresholding (HT)

Conclusion

Stochastic SoundCloud presenting Lucy’s New Mozart Mixtape uses three novel Recurrent Neural Networks (RNNs) used to generate symbolite melodies. This music generated using machine learning techniques using Magenta from Google's Tensorflow AI. Using a LSTM long-short-term-memory model, with three specific RNN examples: Basic RNN, Lookback RNN, and Attention RNN. Outputs ~10 randomly generated output.mid music files that can be opened up on Mac's Garageband.


References

About

Machine Learning Generative Classical Music using RNN LSTMs with MIDI music dataset and Magenta Tensorflow

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published