-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TorchScript-able "load" func to sox_io backend #731
Conversation
eeaa2c7
to
6da4ff4
Compare
cae9321
to
df42c5f
Compare
Codecov Report
@@ Coverage Diff @@
## master #731 +/- ##
==========================================
+ Coverage 89.21% 89.24% +0.02%
==========================================
Files 32 32
Lines 2513 2519 +6
==========================================
+ Hits 2242 2248 +6
Misses 271 271
Continue to review full report at Codecov.
|
99462ec
to
500733f
Compare
6b1134b
to
1c7af65
Compare
|
||
script_path = self.get_temp_path('load_func') | ||
torch.jit.script(py_load_func).save(script_path) | ||
ts_load_func = torch.jit.load(script_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woohoo!! :D
return tensor, signal_info.get_sample_rate() | ||
|
||
|
||
load_wav = load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woohoo!! :D
pass | ||
elif tensor.dtype == torch.int32: | ||
tensor = tensor.to(torch.float32) | ||
tensor[tensor > 0] /= 2147483647. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is there a way of fetching these numbers instead of hard coding them? It's in the tests, so I don't worry much about it though.
// So make sure to create a new copy after processing samples. | ||
if (normalize || dtype == torch::kFloat32) { | ||
t = t.to(torch::kFloat32); | ||
t *= (t > 0) / 2147483647. + (t < 0) / 2147483648.; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this one is not in the tests though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? The combination of float32
and normalize=true|false
are tested everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the 2147483647
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant something different: I have seen this hardcoded values appear in a few places in this PR (e.g. above). Most of them are in tests, so I don't worry much about them. This hardcoded value is not in a test file though, so I would have preferred if it were not hardcoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM!
nit: there's a C++ lint error :)
@@ -51,4 +54,17 @@ def gen_audio_file( | |||
command += ['vol', f'-{attenuation}dB'] | |||
print(' '.join(command)) | |||
subprocess.run(command, check=True) | |||
subprocess.run(['soxi', path], check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why is this removed at this time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command above it, sox -V
prints the same information, so it turned out redundant.
Yeah I was waiting for the approval first as I have six branches depend on this and had to rebase all of them. |
thanks! |
This reverts commit 391be73.
This is a part of PRs to add new "sox_io" backend. #726 and depends on #718 and #728 .
This PR adds
load
function to "sox_io" backend, which is tested on the following audio formats;wav
mp3
flac
ogg/vorbis
*By default, "sox_io" backend returns Tensor with
float32
dtype and the shape of[channel, time]
. The samples are normalized to fit in the range of[-1.0, 1.0]
.Unlike existing "sox" backend, the new
load
function can handle WAV file natively, when the input format is WAV with integer type, (such as 32-bit signed integer, 16-bit signed integer and 8-bit unsigned integer) by providingnormalize=False
, this function can return integer Tensor, where the samples are expressed within the whole range of the corresponding dtype, that is,int32
tensor for32-bit PCM
,int16
for16-bit PCM
anduint8
for8-bit PCM
. This behavior follows scipy.io.wavfile.read.normalize
parameter has no effect for other formats and the load function always return normalized value withfloat32
Tensor.* Note The current binary distribution of torchaudio does not contain
ogg/vorbis
andopus
codecs. To handle these files, one needs to build torchaudio from the source with proper codecs in the system.Note 2 Since this PR,
scipy
becomes required module for running test.