Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请求审核PR #3111

Closed
longRookie opened this issue Mar 29, 2023 · 6 comments
Closed

请求审核PR #3111

longRookie opened this issue Mar 29, 2023 · 6 comments
Assignees
Labels

Comments

@longRookie
Copy link
Contributor

能否请您审核我的 PR 并提供反馈?我将不胜感激。
#3006

@lizezheng
Copy link
Collaborator

感谢贡献代码,在 review中了已经

@lym0302
Copy link
Contributor

lym0302 commented Apr 3, 2023

overlap_ratio:表示重叠率(overlap ratio)或者叫做帧移率(hop size ratio),等于n_fft/hop_size,默认为4(相比于设置为1和2,在设置为4时,实验的合成效果更好),参考paddle istft hop_length
overlap_ratio 设置为 4 这个是经过实验得出的结论吗?整体用目前这一套参数是训练出模型是可以正常的吗?因为和论文里的算法不太一样,所以需要和你确认下这一点

@lym0302
Copy link
Contributor

lym0302 commented Apr 3, 2023

pr 上提了两个意见,有空可以看下

@longRookie
Copy link
Contributor Author

longRookie commented Apr 4, 2023

好的,感谢

@longRookie
Copy link
Contributor Author

longRookie commented Apr 5, 2023

pr 上提了两个意见,有空可以看下

根据pr意见,均已经修改,目前CI的CodeStyle没有通过,但我force push 之前CI通过的代码,也没有通过CodeStyle,不知道是什么原因

@longRookie
Copy link
Contributor Author

longRookie commented Apr 5, 2023

overlap_ratio:表示重叠率(overlap ratio)或者叫做帧移率(hop size ratio),等于n_fft/hop_size,默认为4(相比于设置为1和2,在设置为4时,实验的合成效果更好),参考paddle istft hop_length overlap_ratio 设置为 4 这个是经过实验得出的结论吗?整体用目前这一套参数是训练出模型是可以正常的吗?因为和论文里的算法不太一样,所以需要和你确认下这一点

image

我们复现的模型为C8C8I;
https://github.com/rishikksh20/iSTFTNet-pytorch/blob/ecbf0f635b36432bd3e432790326591bc86cadbc/config_v1.json#L21 中同样采用的是hop_size=4, "n_fft": 1024,"hop_size": 256;

根据这三个参数应该在代码里面被算出来,我们将hop_size和n_fft设置为在代码中计算,overlap_ratio取4,来和论文中设置一致;

另外,我们也尝试了让overlap_ratio= 1, 2,合成的音质会有损失,我们认为这是因为nfft的减小带来了频率分辨率的降低影响了合成的音质。
在overlap_ratio=4时,具体得到的hop_size和nfft和论文不一致,这是因为paddlespeech中hifigan的采用upsample_rates和原始hifigan中不同,paddlespeech中为5,5,4,3,hifigan中为8,8,2,2,如果后两个元素的乘积为4,我们的hop_size和n_fft将和原论文一致,

训练模型是ok的,我们在对应的iSTFTNet.md中给出了50000次的hifigan和istftNet的预训练模型百度网盘链接和实验结果对比

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants