-
Notifications
You must be signed in to change notification settings - Fork 666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TorchScript-based SoX I/O backend #726
Conversation
54fbcc2
to
cb85a45
Compare
f85d969
to
167cc55
Compare
This is a part of PRs to add new "sox_io" backend. #726 This PR adds `SignalInfo` structure, which is data exchange interface between Python and C++ in coming TorchScript-based sox IO backend. For the case, where C++ extension is not available (i.e. Windows), this PR also adds dummy class and module that will be substituted. This logic is implemented in `torchaudio.extension` moduel.
4bf7016
to
bfee816
Compare
d33a6ff
to
3ffc88e
Compare
This is a part of PRs to add new "sox_io" backend. #726 and depends on #718 and #728 . This PR adds `load` function to "sox_io" backend, which is tested on the following audio formats; - `wav` - `mp3` - `flac` - `ogg/vorbis` * By default, "sox_io" backend returns Tensor with `float32` dtype and the shape of `[channel, time]`. The samples are normalized to fit in the range of `[-1.0, 1.0]`. Unlike existing "sox" backend, the new `load` function can handle WAV file natively, when the input format is WAV with integer type, (such as 32-bit signed integer, 16-bit signed integer and 8-bit unsigned integer) by providing `normalize=False`, this function can return integer Tensor, where the samples are expressed within the whole range of the corresponding dtype, that is, `int32` tensor for `32-bit PCM`, `int16` for `16-bit PCM` and `uint8` for `8-bit PCM`. This behavior follows [scipy.io.wavfile.read](https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.read.html). `normalize` parameter has no effect for other formats and the load function always return normalized value with `float32` Tensor. __* Note__ The current binary distribution of torchaudio does not contain `ogg/vorbis` and `opus` codecs. To handle these files, one needs to build torchaudio from the source with proper codecs in the system. __Note 2__ Since this PR, `scipy` becomes required module for running test.
Codecov Report
@@ Coverage Diff @@
## master #726 +/- ##
==========================================
+ Coverage 89.14% 89.16% +0.02%
==========================================
Files 32 32
Lines 2561 2566 +5
==========================================
+ Hits 2283 2288 +5
Misses 278 278
Continue to review full report at Codecov.
|
8ae9ebd
to
a46c34f
Compare
if backend == 'sox_io': | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is sox_io
made a special case and skipped here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test cases in this class depends on the global state of the previous test ran, which brakes the principle of unit test and having sox_io
breaks it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
thanks! |
…ated against (pytorch#726) Co-authored-by: holly1238 <[email protected]>
This PR (and dependent PRs) adds a new backend "sox_io" backend.
(So that users can opt-in, but we do not intend to change the default yet.)
The new "sox_io" backend has the following advantages;
The data process pipeline written using the new backend can be dumped and used from C++.
The original "sox" backend had a number of issues,
load
function cannot handle WAV file correctly.I have added bunch of tests to make sure that the new backend does not have the same issue.
This includes read/write operaions of
wav
,flac
,mp3
andogg/vorbis
formats. *This backend can also read
opus
,though it's not in unit test.Add opus support to binary distribution #755dtype
is picked depending on the internal representation of WAV format. This behavior is same as how SciPy handles WAV file.normalize
option, which correctly maps integer value range to[-1.0, 1.0]
withfloat32
.sox_signalinfo_t
andsox_encodinginfo_t
structs directly, but TorchScript does not allow this. Also setting the correct parameters for these structs is not easy. In the new backend, options related to sox-internal are handled in C++, and users only need to providecompression
option that corresponds tosox
's-C
option.Note The current binary distribution of torchaudio does not contain#750ogg/vorbis
codecs. To handle these files, you need to build torchaudio from the source. Refer to README for the instruction.