Add TorchScript-based SoX I/O backend #726

mthrok · 2020-06-17T17:20:22Z

This PR (and dependent PRs) adds a new backend "sox_io" backend.

Add SignalInfo typedef, and extension module #718 Add SignalInfo typedef
Add TorchScript-able "info" func to sox_io backend #728 Add "info"
Fix SignalInfo member name to frame #734 Fix SignalInfo
Add TorchScript-able "load" func to sox_io backend #731 Add "load"
Add TorchScript-able "save" func to sox_io backend #732 Add "save"
This PR Add "sox_io" to the list of available backends.
(So that users can opt-in, but we do not intend to change the default yet.)

The new "sox_io" backend has the following advantages;

TorchScript-able
The data process pipeline written using the new backend can be dumped and used from C++.
Correct
The original "sox" backend had a number of issues,
- Info length and rate returns different values for different backends #618 Length Inaccurate on Multi-Channel Audio Files #236 Info length and rate returns different
- Save / Load Bugs #430 Saving and loading the downsampled audio results in a tensor with zeros. #252 Save then load would degrade the data
- Fbank features are different from Kaldi Fbank #400 (comment) load function cannot handle WAV file correctly.
  I have added bunch of tests to make sure that the new backend does not have the same issue.
  This includes read/write operaions of wav, flac, mp3 and ogg/vorbis formats. *
  This backend can also read opus, ~~though it's not in unit test.~~ Add opus support to binary distribution #755
Clean interface
- When loading WAV file, correct dtype is picked depending on the internal representation of WAV format. This behavior is same as how SciPy handles WAV file.
  - Load function also provides normalize option, which correctly maps integer value range to [-1.0, 1.0] with float32.
- The existing"sox" backend exposes sox_signalinfo_t and sox_encodinginfo_t structs directly, but TorchScript does not allow this. Also setting the correct parameters for these structs is not easy. In the new backend, options related to sox-internal are handled in C++, and users only need to provide compression option that corresponds to sox's -C option.

~~Note The current binary distribution of torchaudio does not contain ogg/vorbis codecs. To handle these files, you need to build torchaudio from the source. Refer to README for the instruction.~~ #750

This is a part of PRs to add new "sox_io" backend. #726 This PR adds `SignalInfo` structure, which is data exchange interface between Python and C++ in coming TorchScript-based sox IO backend. For the case, where C++ extension is not available (i.e. Windows), this PR also adds dummy class and module that will be substituted. This logic is implemented in `torchaudio.extension` moduel.

This is a part of PRs to add new "sox_io" backend. #726 and depends on #718 and #728 . This PR adds `load` function to "sox_io" backend, which is tested on the following audio formats; - `wav` - `mp3` - `flac` - `ogg/vorbis` * By default, "sox_io" backend returns Tensor with `float32` dtype and the shape of `[channel, time]`. The samples are normalized to fit in the range of `[-1.0, 1.0]`. Unlike existing "sox" backend, the new `load` function can handle WAV file natively, when the input format is WAV with integer type, (such as 32-bit signed integer, 16-bit signed integer and 8-bit unsigned integer) by providing `normalize=False`, this function can return integer Tensor, where the samples are expressed within the whole range of the corresponding dtype, that is, `int32` tensor for `32-bit PCM`, `int16` for `16-bit PCM` and `uint8` for `8-bit PCM`. This behavior follows [scipy.io.wavfile.read](https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.read.html). `normalize` parameter has no effect for other formats and the load function always return normalized value with `float32` Tensor. __* Note__ The current binary distribution of torchaudio does not contain `ogg/vorbis` and `opus` codecs. To handle these files, one needs to build torchaudio from the source with proper codecs in the system. __Note 2__ Since this PR, `scipy` becomes required module for running test.

codecov · 2020-06-25T23:34:05Z

Codecov Report

Merging #726 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #726      +/-   ##
==========================================
+ Coverage   89.14%   89.16%   +0.02%     
==========================================
  Files          32       32              
  Lines        2561     2566       +5     
==========================================
+ Hits         2283     2288       +5     
  Misses        278      278

Impacted Files	Coverage Δ
torchaudio/backend/utils.py	`89.13% <100.00%> (+1.32%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a20da5e...a4c824a. Read the comment docs.

This is a part of PRs to add new "sox_io" backend. #726 and depends on #718, #728 and #731. This PR adds `save` function to "sox_io" backend, which can save Tensor to a file with the following audio formats; - `wav` - `mp3` - `flac` - `ogg/vorbis`

vincentqb · 2020-07-01T21:34:45Z

test/test_io.py

+            if backend == 'sox_io':
+                continue


Why is sox_io made a special case and skipped here?

The test cases in this class depends on the global state of the previous test ran, which brakes the principle of unit test and having sox_io breaks it.

vincentqb

LGTM

mthrok · 2020-07-01T23:42:53Z

thanks!

…ated against (pytorch#726) Co-authored-by: holly1238 <[email protected]>

mthrok force-pushed the sox_io_backend branch 7 times, most recently from 54fbcc2 to cb85a45 Compare June 18, 2020 14:57

mthrok changed the title ~~Add TorchScript-able SoX I/O backend~~ Add TorchScript-based SoX I/O backend Jun 18, 2020

mthrok force-pushed the sox_io_backend branch 12 times, most recently from f85d969 to 167cc55 Compare June 18, 2020 20:57

This was referenced Jun 18, 2020

Add SignalInfo typedef, and extension module #718

Merged

Add TorchScript-able "info" func to sox_io backend #728

Merged

Add TorchScript-able "load" func to sox_io backend #731

Merged

Add TorchScript-able "save" func to sox_io backend #732

Merged

mthrok force-pushed the sox_io_backend branch from 167cc55 to ac3852d Compare June 18, 2020 21:30

mthrok force-pushed the sox_io_backend branch from ac3852d to f400b83 Compare June 18, 2020 22:05

mthrok mentioned this pull request Jun 18, 2020

Info length and rate returns different values for different backends #618

Closed

mthrok force-pushed the sox_io_backend branch 2 times, most recently from 4bf7016 to bfee816 Compare June 19, 2020 12:12

mthrok force-pushed the sox_io_backend branch 9 times, most recently from d33a6ff to 3ffc88e Compare June 25, 2020 23:08

mthrok force-pushed the sox_io_backend branch from 3ffc88e to 13999b0 Compare June 25, 2020 23:19

mthrok force-pushed the sox_io_backend branch 3 times, most recently from 8ae9ebd to a46c34f Compare June 30, 2020 01:46

mthrok force-pushed the sox_io_backend branch from a46c34f to 7302fef Compare July 1, 2020 18:44

Add sox_io_backend

a4c824a

mthrok force-pushed the sox_io_backend branch from bbf0766 to a4c824a Compare July 1, 2020 20:57

mthrok marked this pull request as ready for review July 1, 2020 21:16

mthrok requested a review from vincentqb July 1, 2020 21:16

vincentqb reviewed Jul 1, 2020

View reviewed changes

vincentqb approved these changes Jul 1, 2020

View reviewed changes

mthrok merged commit 4b583ea into pytorch:master Jul 1, 2020

mthrok deleted the sox_io_backend branch July 1, 2020 23:42

mthrok mentioned this pull request Jul 16, 2020

Update documentation and fix docstrings #788

Merged

mthrok mentioned this pull request Sep 10, 2020

[Announcement] Improving I/O for correct and consistent experience #903

Closed

mthrok mentioned this pull request Jun 16, 2021

Length Inaccurate on Multi-Channel Audio Files #236

Closed

mthrok pushed a commit to mthrok/audio that referenced this pull request Dec 13, 2022

beginner/blitz/nn: Fix misleading typo on which term to be differenti…

ff0cfa1

…ated against (pytorch#726) Co-authored-by: holly1238 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TorchScript-based SoX I/O backend #726

Add TorchScript-based SoX I/O backend #726

mthrok commented Jun 17, 2020 •

edited

Loading

codecov bot commented Jun 25, 2020 •

edited

Loading

vincentqb Jul 1, 2020 •

edited

Loading

mthrok Jul 1, 2020

vincentqb left a comment

mthrok commented Jul 1, 2020

Add TorchScript-based SoX I/O backend #726

Add TorchScript-based SoX I/O backend #726

Conversation

mthrok commented Jun 17, 2020 • edited Loading

codecov bot commented Jun 25, 2020 • edited Loading

Codecov Report

vincentqb Jul 1, 2020 • edited Loading

Choose a reason for hiding this comment

mthrok Jul 1, 2020

Choose a reason for hiding this comment

vincentqb left a comment

Choose a reason for hiding this comment

mthrok commented Jul 1, 2020

mthrok commented Jun 17, 2020 •

edited

Loading

codecov bot commented Jun 25, 2020 •

edited

Loading

vincentqb Jul 1, 2020 •

edited

Loading