Support for multi-channel audio data #8728

yunbin · 2024-03-22T16:56:44Z

Describe the bug

NeMo training and decoding scripts do not support multi-channel audio data.

Steps/Code to reproduce bug

It does not support specifying which channel to use for each audio file in each line in train.manifest.json or test.manifest.json file.

I was able to run ./examples/asr/speech_to_text_eval.py with "channel_selector=" to specify the channel for all the audio in a manifest.json file, but I can't find a way to specifying them for each audio file inside the manifest.json file.

Expected behavior

Can NeMo team add this useful feature to work with a diverse set of multi-channel training and testing audio data so data from different channel can be mixed within a manifest.json file?

Environment overview (please complete the following information)

NeMo was installed by pip in a conda environment. It works for single channel audio data.

anteju · 2024-03-22T21:29:23Z

Thanks @yunbin, we'll take a look at adding this functionality over the next couple weeks. We'll post future updates here.

yunbin · 2024-04-04T17:19:23Z

@anteju Any update on getting the channel feature implemented in NeMo training scripts?

anteju · 2024-04-23T00:21:34Z

@yunbin, please check #9007.
The change there enables using channel selector from manifest for nvidia/canary-1b model.

yunbin added the bug Something isn't working label Mar 22, 2024

anteju added the feature request/PR for a new feature label Mar 22, 2024

anteju self-assigned this Mar 22, 2024

anteju removed the bug Something isn't working label Mar 22, 2024

anteju mentioned this issue Apr 23, 2024

[ASR] Support for transcription of multi-channel audio for AED models #9007

Merged

8 tasks

nithinraok closed this as completed May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for multi-channel audio data #8728

Support for multi-channel audio data #8728

yunbin commented Mar 22, 2024

anteju commented Mar 22, 2024

yunbin commented Apr 4, 2024

anteju commented Apr 23, 2024

Support for multi-channel audio data #8728

Support for multi-channel audio data #8728

Comments

yunbin commented Mar 22, 2024

anteju commented Mar 22, 2024

yunbin commented Apr 4, 2024

anteju commented Apr 23, 2024