-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The processing efficiency and sampling rate problem of OPUS files #1149
Comments
Hmm, I remember disabling it because I found the reverse to be true on some systems. I think the best way forward would be to expose the control over this to the user. I'll aim to make a PR to enable this later as I was recently refactoring some of this code, it should be easily doable. |
Regarding 48kHz vs 16kHz, I'm not sure I got your point. OPUS is always decoded to 48kHz even if the original audio had smaller sampling rate, unless I missed something. |
For example, I have a .opus file in my dataset, if I use torchaudio.info() to get the sampling rate, it shows 16kHz. Also, if I use ffmpeg to read it, the information shows the input sampling rate is 16kHz. If the param force_opus_sampling_rate is not passed to read_opus_ffmpeg, then the number of samples will be read in 16kHz(actual) while with the sampling rate 48kHz(default) in the recording.
It will cause a mismatch in the subsequent computations. |
If the file has 16kHz, that makes sense. I just never encountered an OPUS file that actually has a sampling rate other than 48kHz, even when I encoded WAV data into OPUS that had a smaller SR... I think your proposed changes make sense, could you make a PR? |
OK. |
I'am trying to process a large dataset with .wav and .opus files recently, and found that the processing of .wav files is nearly 6 times faster than the processing of .opus files, specifically in the generation of recordings and supervisions. After debugging, I found the difference is that .wav file is processed with torchaudio and .opus file is processed with ffmpeg.
The read_opus function in lhotse/audio/backend.py is:
Althought the note says ffmpeg is faster, but in my case, torchaudio is better. I just use the read_opus_torchaudio in the above code, then the speedup appears.
![Untitled](https://private-user-images.githubusercontent.com/77770002/268162242-c6960bf5-06fe-470d-a8b5-c5eb12a83d96.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NTQ0OTAsIm5iZiI6MTczOTY1NDE5MCwicGF0aCI6Ii83Nzc3MDAwMi8yNjgxNjIyNDItYzY5NjBiZjUtMDZmZS00NzBkLWE4YjUtYzVlYjEyYTgzZDk2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDIxMTYzMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTYyMjY1MzY5MTZjOWMwZGE1ODU0ZmQ5NzRjYzA3NTk1YWQwMDQyMjRmNzhmZjc5OTRhMGM3ZGU0ZjE3ODNkZGEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.fZR2QV_X-FO0M9lBY8b8XH5DXNMsCTSthd88EWULe_Y)
![1694747753603](https://private-user-images.githubusercontent.com/77770002/268162290-c62fea49-e50b-4bae-9ca1-5f3c8988e075.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NTQ0OTAsIm5iZiI6MTczOTY1NDE5MCwicGF0aCI6Ii83Nzc3MDAwMi8yNjgxNjIyOTAtYzYyZmVhNDktZTUwYi00YmFlLTljYTEtNWYzYzg5ODhlMDc1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDIxMTYzMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTYyMWI1MzIxMTU0YTM0MzQ0ZTA2NWM2ZGVjNDM2OWQ2M2JlMDg4ODY5NDYzYTc3MTc5NWQ4ZWVmODBhZjRlNTYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.zC5GZgxhJKHcGHIfYzcbv1l-GF6Imu9jlnNGM6qPZJU)
pytorch: 1.13
ffmpeg:
torchaudio:
Also, there is another problem when using the read_opus_ffmpeg function:
It assumes all the .opus files have sampling_rate 48000,that will be a problem if the dataset is not so normal, for example, in my case, it could be 16000. Then, the recorded sampling_rate will be 48000 while the file is read with actual sampling_rate 16000 if the force_opus_sampling_rate is not specified, which will affect the following computation of num_samples and features.
I think just set the cmd with '-ar sampling_rate ' will solve the problem, for example:
The text was updated successfully, but these errors were encountered: