Add support for distil-large-v3 #755

sanchit-gandhi · 2024-03-21T11:43:46Z

The latest Distil-Whisper model, distil-large-v3, is intrinsically designed to work with the OpenAI sequential algorithm.

This PR adds support for this checkpoint.

BBC-Esq · 2024-03-21T13:01:22Z

Nice to see some comradare.

sanchit-gandhi · 2024-03-21T17:32:39Z

cc @trungkienbkhn - this one's ready for review!

README.md

sanchit-gandhi · 2024-03-21T19:24:06Z

Hey @trungkienbkhn! The model is now live under https://huggingface.co/distil-whisper/distil-large-v3-ct2

Feel free to merge this PR at your convenience to enable Faster-Whisper support!

Also cc @nguyendc-systran @metame-none @Purfview

trungkienbkhn · 2024-03-22T07:45:36Z

@sanchit-gandhi , thanks for your contribution.
After testing with the example provided, I found that the transcription time for the fw-distil-large-v3 model has been significantly reduced while maintaining almost the same quality compared with fw-large-v3 model.

However, when I added the option word_timestamps=True, I encountered an error:

segments, info = model.transcribe("audio.mp3", condition_on_previous_text=False, language="en", word_timestamps=True)

Processing segment at 00:00.000
[1]    7829 segmentation fault (core dumped)

This is caused by wrong alignment_heads field (a list of [layer, head]) in the config.json file of fw-distil-large-v3 model.
Since the distill-whisper only has 2 decode layers, fw is unable to extract word-level timestamps from layer index > 1.

So we should use the default alignment_heads for fw-distil-large-v3 model, same as fw-distil-large-v2 model. For more details on this logic, you can refer to the implementation in ctranslate2.
Could you re-verify and update this field? Tks.

Purfview · 2024-03-22T13:51:12Z

@trungkienbkhn
Could you make usual conversion to float16 and place it at https://huggingface.co/Systran/faster-distil-whisper-large-v3

BBC-Esq · 2024-03-22T14:32:06Z

Oh yeah, almost forgot, wouldn't the be necessary because the faster-whisper library automatically points to the Systran repository unless another one is explicitly specified?

Vaibhavs10 · 2024-03-22T15:18:23Z

Could you make usual conversion to float16 and place it at https://huggingface.co/Systran/faster-distil-whisper-large-v3

Brilliant idea! That way all the faster-whisper checkpoints remain in one org.
We can remove the preliminary ct2 checkpoint in distil-whisper org and point the README and documentation to SYSTRAN.

sanchit-gandhi · 2024-03-22T16:24:54Z

Alignment heads updated according to those in distil-large-v2: https://huggingface.co/distil-whisper/distil-large-v3-ct2/discussions/1

Note that we'd get a better alignment by inspecting the alignment from the DTW algorithm, e.g. as done here.

However, the heuristic to use the last half layers for alignment by default should suffice for a first version.

trungkienbkhn · 2024-03-25T17:14:18Z

Could you make usual conversion to float16 and place it at https://huggingface.co/Systran/faster-distil-whisper-large-v3

FYI, we have released a new ct2 conversion model (using float16) for distil-large-v3: https://huggingface.co/Systran/faster-distil-whisper-large-v3

faster_whisper/utils.py

sanchit-gandhi · 2024-03-25T18:51:22Z

Awesome, thanks @trungkienbkhn. I've updated this PR to use these fp16 weights.

sanchit-gandhi · 2024-03-26T10:55:53Z

Feel free to merge this PR at your convenience - it would be awesome to unblock faster-whisper for the distil-whisper community.

add distil-large-v3

4811f78

sanchit-gandhi marked this pull request as ready for review March 21, 2024 17:32

sanchit-gandhi commented Mar 21, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

Update README.md

61a5502

sanchit-gandhi changed the title ~~[WIP] Add support for distil-large-v3~~ Add support for distil-large-v3 Mar 21, 2024

sanchit-gandhi commented Mar 25, 2024

View reviewed changes

faster_whisper/utils.py Outdated Show resolved Hide resolved

use fp16 weights from Systran

dc265ba

nguyendc-systran merged commit a67e0e4 into SYSTRAN:master Mar 26, 2024
3 checks passed

ahxxm mentioned this pull request May 19, 2024

Distil-Whisper support? m-bain/whisperX#558

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for distil-large-v3 #755

Add support for distil-large-v3 #755

sanchit-gandhi commented Mar 21, 2024 •

edited

Loading

BBC-Esq commented Mar 21, 2024

sanchit-gandhi commented Mar 21, 2024

sanchit-gandhi commented Mar 21, 2024 •

edited

Loading

trungkienbkhn commented Mar 22, 2024 •

edited

Loading

Purfview commented Mar 22, 2024

BBC-Esq commented Mar 22, 2024

Vaibhavs10 commented Mar 22, 2024

sanchit-gandhi commented Mar 22, 2024

trungkienbkhn commented Mar 25, 2024

sanchit-gandhi commented Mar 25, 2024

sanchit-gandhi commented Mar 26, 2024

Add support for distil-large-v3 #755

Add support for distil-large-v3 #755

Conversation

sanchit-gandhi commented Mar 21, 2024 • edited Loading

BBC-Esq commented Mar 21, 2024

sanchit-gandhi commented Mar 21, 2024

sanchit-gandhi commented Mar 21, 2024 • edited Loading

trungkienbkhn commented Mar 22, 2024 • edited Loading

Purfview commented Mar 22, 2024

BBC-Esq commented Mar 22, 2024

Vaibhavs10 commented Mar 22, 2024

sanchit-gandhi commented Mar 22, 2024

trungkienbkhn commented Mar 25, 2024

sanchit-gandhi commented Mar 25, 2024

sanchit-gandhi commented Mar 26, 2024

sanchit-gandhi commented Mar 21, 2024 •

edited

Loading

sanchit-gandhi commented Mar 21, 2024 •

edited

Loading

trungkienbkhn commented Mar 22, 2024 •

edited

Loading