Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ASR] Support for transcription of multi-channel audio for AED models #9007

Merged
merged 3 commits into from
Apr 30, 2024

Conversation

anteju
Copy link
Collaborator

@anteju anteju commented Apr 23, 2024

What does this PR do ?

Currently, AED models do not use channel selector.
If the input manifest is pointing to multi-channel audio files, transcription will fail with

AssertionError: Expected MonoCut.

This PR adds support for

  1. setting channel_selector to an integer value, e.g., 0/1 to always select the first/second channel in the input file
  2. setting channel_selector to a string value to use a field from the input manifest to select the channel

Collection: ASR

Changelog

  • Propagate channel_selector in _setup_transcribe_dataloader
  • Added map with _select_channel in nemo/collections/common/data/lhotse/dataloader.py

Usage

Example of use, assuming Canary model is loaded

channel_selector = 0 # use the first channel
channel_selector = 1 # use the second channel
channel_selector = 'reference_channel' # use 'reference_channel' field from the manifest file

predicted_text = canary_model.transcribe(
      mc_manifest, # manifest with multi-channel audio
      batch_size=1,  # batch size to run the inference with
      channel_selector=channel_selector,  # channel
)

Similarly, it can be used with transcribe_speech.py

CHANNEL_SELECTOR=0
CHANNEL_SELECTOR=1
CHANNEL_SELECTOR=reference_channel # use 'reference_channel' field from the manifest file

python ${NEMO_ROOT}/examples/asr/transcribe_speech.py \
    pretrained_name=nvidia/canary-1b \
    dataset_manifest=./mc_manifest.json \
    channel_selector=${CHANNEL_SELECTOR}

Toy Example:

Scripts & toy dataset:

Run as

# test transcribe_speech.py
bash test_transcribe.sh 0
bash test_transcribe.sh 1
bash test_transcribe.sh reference_channel

# test python API
python test_transcribe.py --channel-selector 0
python test_transcribe.py --channel-selector 1
python test_transcribe.py --channel-selector reference_channel

Jenkins CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

There's no need to comment jenkins on the PR to trigger Jenkins CI.
The GitHub Actions CI will run automatically when the PR is opened.
To run CI on an untrusted fork, a NeMo user with write access must click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

@anteju anteju requested a review from pzelasko April 23, 2024 00:20
@anteju anteju force-pushed the feature/asr-channel-selector-from-manifest branch 3 times, most recently from 1e7a297 to 5a7bc84 Compare April 23, 2024 17:19
@anteju
Copy link
Collaborator Author

anteju commented Apr 23, 2024

jenkins

@anteju anteju force-pushed the feature/asr-channel-selector-from-manifest branch from 5a7bc84 to 043bf0c Compare April 24, 2024 21:31
Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Left two comments.

# Apply channel selector
if config.channel_selector is not None:
logging.info('Using channel selector %s.', config.channel_selector)
cuts = cuts.map(partial(_select_channel, channel_selector=config.channel_selector), apply_fn=None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should leave the default behavior of .map here, otherwise it might try to apply this to text-only examples.

Suggested change
cuts = cuts.map(partial(_select_channel, channel_selector=config.channel_selector), apply_fn=None)
cuts = cuts.map(partial(_select_channel, channel_selector=config.channel_selector))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

f"Channel index {channel_idx} is larger than the actual number of channels {cut.num_channels}"
)

return cut.with_channels(channel_idx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm .with_channels is only defined on MultiCut, perhaps we should add a check like:

if cut.num_channels == 1: 
    return cut
else:
    return cut.with_channels(channel_idx)

WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, added this (will push in a bit).

@anteju anteju force-pushed the feature/asr-channel-selector-from-manifest branch from 9ed7e9b to 6fc9b7f Compare April 29, 2024 18:32
@anteju anteju requested a review from pzelasko April 29, 2024 18:32
@anteju anteju force-pushed the feature/asr-channel-selector-from-manifest branch 2 times, most recently from 8337553 to 8463836 Compare April 29, 2024 20:48
Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anteju
Copy link
Collaborator Author

anteju commented Apr 29, 2024

jenkins

anteju added 3 commits April 30, 2024 09:36
…t_lhotse_dataloader_from config

Signed-off-by: Ante Jukić <[email protected]>
Signed-off-by: Ante Jukić <[email protected]>
Signed-off-by: Ante Jukić <[email protected]>
@anteju anteju force-pushed the feature/asr-channel-selector-from-manifest branch from 8463836 to a8cfda6 Compare April 30, 2024 16:36
@anteju anteju merged commit fe4b291 into NVIDIA:main Apr 30, 2024
126 checks passed
suiyoubi pushed a commit that referenced this pull request May 2, 2024
…#9007)

* Propagate channel selector for AED model + add channel selector to get_lhotse_dataloader_from config

Signed-off-by: Ante Jukić <[email protected]>

* Included comments

Signed-off-by: Ante Jukić <[email protected]>

* Added unit test

Signed-off-by: Ante Jukić <[email protected]>

---------

Signed-off-by: Ante Jukić <[email protected]>
Signed-off-by: Ao Tang <[email protected]>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
…NVIDIA#9007)

* Propagate channel selector for AED model + add channel selector to get_lhotse_dataloader_from config

Signed-off-by: Ante Jukić <[email protected]>

* Included comments

Signed-off-by: Ante Jukić <[email protected]>

* Added unit test

Signed-off-by: Ante Jukić <[email protected]>

---------

Signed-off-by: Ante Jukić <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants