Integrate faster version of whisper (batched faster whisper) to Aana SDK #41

ashwinnair14 · 2024-01-24T09:47:31Z

Feature Summary

Concise description of the feature
You can integrate the faster-batched version of a whisper into the Aana SDK.

Justification/Rationale

Why is the feature beneficial?
This feature enables a faster version of whisper that uses VAD(voice activity detection) and improves batching to improve the throughput by approximately 4x.

Proposed Implementation (if any)

How do you envision the implementation of this feature?
There are 2 options for the implementation.

1st A separate endpoint for the batched whisper.
2nd A flag/param for the existing endpoint to enable batched inference with a trade-off on WER. A bunch of parameters usually familiar to the user are not, e.g. without_timestamps (no word word-level timestamps).

VAD would be introduced as a separate deployment.

Jiltseb · 2024-01-25T09:54:38Z

As per discussion, we will create a separate Endpoint for batched faster-whisper. We could even consider it as a separate target in the future.

ashwinnair14 · 2024-02-02T09:49:48Z

Comments from Jilt

Below is the benchmarking result:
https://docs.google.com/spreadsheets/d/1XMVbwDnVissogqf5MHptal29tUV0VAlvbVDPxrzAX5U/edit?pli=1#gid=2029644071
Observations:
Video to audio extraction as an initial steps improves speed, especially for threading.
The difference between single and multiple deployment reduces if we perform audio extraction.
Best results obtained if we have a single deployment for both asr and vad models (3% and 2% difference only).
With separate deployments, we can decouple stages, and use vad stage for multiple models.

Jiltseb · 2024-02-07T14:11:44Z

Steps:

Add VAD deployment, Vad parameters, deployments.py updates: Done
Add batched_inference method in whisper deployment, wrapping the whisper model in batched inference pipeline: Done
Define nodes, end points, initial API calls for testing: Done
Implementation and comparison of different pipelines for batched inference: Done
I. video input and Vad+whisper as a single deployment.
ii. video input and vad and whisper as separate deployment.
iii. video to audio conversion and vad+whisper as a single deployment.
iv. video to audio conversion and vad and whisper as separate deployment.
v. Apply above to direct audio input as well.
v. Add multiple model replica and tests for speed.
Handling audio conversion, loading and deletion: Done
I. Unlike Image and Video objects, there was no specific dataclass for Audio. Created a data class with basic i/o functionalities consistent with other dataclasses.
ii. Handling cleanups for Audio objects.
Handling video files without audio ( Passes through nodes with empty content and writes empty transcription): Done
Changing/adding default values of the initial whisper implementation (eg: vad_filter to True): new issue
Testing: Done
I. Modify old tests based on input type change.: Done
ii. Add test for vad_deployment: Done
iii. Add test for whisper_deployment (extra added methods), with input taken from the expected vad_deployment output.: Done
iv. Integration test with new endpoint for transcribe_batch: Done
Change all whisper functions to accept Audio input format and not Videos: Done
Changes to the nodes in the pipeline, and endpoints based on the audio type.: Done

ashwinnair14 added the enhancement New feature or request label Jan 24, 2024

ashwinnair14 assigned Jiltseb Jan 25, 2024

Jiltseb changed the title ~~Integrate faster version wispher (batched faster wispher) to Aana SDK~~ Integrate faster version of whisper (batched faster whisper) to Aana SDK Jan 25, 2024

movchan74 added the wip Work In Progress label Feb 9, 2024

Jiltseb mentioned this issue Feb 14, 2024

Implementation of batched whisper and updates on Audio pipeline #53

Merged

Jiltseb linked a pull request Feb 14, 2024 that will close this issue

Implementation of batched whisper and updates on Audio pipeline #53

Merged

movchan74 self-assigned this Feb 16, 2024

Jiltseb closed this as completed in #53 Mar 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate faster version of whisper (batched faster whisper) to Aana SDK #41

Integrate faster version of whisper (batched faster whisper) to Aana SDK #41

ashwinnair14 commented Jan 24, 2024 •

edited

Loading

Jiltseb commented Jan 25, 2024

ashwinnair14 commented Feb 2, 2024

Jiltseb commented Feb 7, 2024 •

edited

Loading

Integrate faster version of whisper (batched faster whisper) to Aana SDK #41

Integrate faster version of whisper (batched faster whisper) to Aana SDK #41

Comments

ashwinnair14 commented Jan 24, 2024 • edited Loading

Feature Summary

Justification/Rationale

Proposed Implementation (if any)

Jiltseb commented Jan 25, 2024

ashwinnair14 commented Feb 2, 2024

Jiltseb commented Feb 7, 2024 • edited Loading

ashwinnair14 commented Jan 24, 2024 •

edited

Loading

Jiltseb commented Feb 7, 2024 •

edited

Loading