Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preparing Train Dataset (mixing strategy) #81

Open
kthworks opened this issue Mar 5, 2024 · 0 comments
Open

Preparing Train Dataset (mixing strategy) #81

kthworks opened this issue Mar 5, 2024 · 0 comments
Labels
question Further information is requested

Comments

@kthworks
Copy link

kthworks commented Mar 5, 2024

Thank you for your excellent work.

I am in the process of training the EnCodec model and have some questions regarding the mixing strategy.

I am interested in learning more about the entire training dataset. The paper outlines the training/validation set into four parts as follows:
(s1) Sampling a single source from Jamendo with a probability of 0.32;
(s2) Sampling a single source from other datasets with the same probability;
(s3) Mixing two sources from all datasets with a probability of 0.24;
(s4) Mixing three sources from all datasets except music with a probability of 0.12.

Does this mean that the training/validation dataset is composed of segments in the ratio of s1/s2/s3/s4 = 32%/32%/24%/12%? In the appendix, Table 1 indicates that the duration of the Jamendo dataset is 919 hours, but the duration of Common Voice is 9,096 hours. Did you not use all the samples from Common Voices?

I would also like to know more about the process of applying reverberation. Apart from the samples available in DNS, how do you apply reverberation to samples from other datasets? Is there a way to calculate the room impulse response? I would appreciate it if you could let me know where I can refer to any related implementations.

If anyone can provide assistance regarding this matter, please leave a comment, Thank you :)

@kthworks kthworks added the question Further information is requested label Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant