Hyperparameter documentation request #34

palms · 2022-08-24T00:37:14Z

palms
Aug 24, 2022

Hey, been playing with PedalNetRT - kudos for all of the hardwork. After getting set up and working on some models I wanted to play with the hyperparameters but noticed no documentation. I'm peripherally aware of things like dilation depth but is it possible to get some info as to how playing with some of these will affect the actual training process? Thanks!

GuitarML · 2022-09-14T14:34:34Z

GuitarML
Sep 14, 2022
Maintainer

Better documentation is a good idea, in the meantime I'll try to give a high level summary here.

There are four main hyperparameter settings that will change the WaveNet network structure:

num_channels = Number of inputs to the network layer, the more channels the more features it may be able to learn at the cost of processing power. I've tested 2 to about 16, after 16 I usually can't run the network in real-time. Generally speaking, the more channels, the more accurate the model will be.

dilation_depth = The number of dilated layers. I have only experimented with 8,9, and 10. A dilation_depth of 10 will give 10 layers with the following dilations: [1, 2, 4, 8, 16, 32, 64 ,128, 256, 512]. Basically, the more dilated layers, the farther back in time the network can reach to predict the next sample. It typically gets more accurate up to a point, then it levels off. The more layers, the slower the network will run.

num_repeat = This repeats the dilated layers, so 2 repeats with a dilation depth of 9 will give the 18 layers with the following dilations:
[1, 2, 4, 8, 16, 32, 64 ,128, 256, 1, 2, 4, 8, 16, 32, 64 ,128, 256]. The paper from Aalto University (see ref link below) tests the following setups, which I have mostly stuck to:

dilation_depth=8, num_repeat=3
dilation_depth=9, num_repeat=2
dilation_depth=10, num_repeat=1

These were chosen to balance real-time performance with decent accuracy. There are some graphs in the paper where they plot real-time performance vs. accuracy.

kernel_size = (also called filter_size) is the size of the filter when performing convolution calculations. I've tested 2 and 3, more seems to have a big hit on real-time performance and 1 doesn't perform well with accuracy. 3 gives a significantly more accurate result than 2 in most cases.

The general theme is, higher numbers creates a bigger network, which results in better accuracy and slower real-time performance. In general I've stuck to the following settings with a decent balance between real-time performance and accuracy:

Network 1 (10 layer) : num_channels=10, dilation_depth=10, num_repeat=1, kernel_size=3

Network 2 (18 layer): num_channels=6, dilation_depth=9, num_repeat=2, kernel_size=3

And then you can deviate the above "num_channels" higher or lower depending on your cpu needs.

Aalto Paper PDF

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperparameter documentation request #34

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Hyperparameter documentation request #34

palms Aug 24, 2022

Replies: 1 comment

GuitarML Sep 14, 2022 Maintainer

palms
Aug 24, 2022

GuitarML
Sep 14, 2022
Maintainer