Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for distil-large-v3 #755

Merged
merged 3 commits into from
Mar 26, 2024

Conversation

sanchit-gandhi
Copy link
Contributor

@sanchit-gandhi sanchit-gandhi commented Mar 21, 2024

The latest Distil-Whisper model, distil-large-v3, is intrinsically designed to work with the OpenAI sequential algorithm.

This PR adds support for this checkpoint.

@BBC-Esq
Copy link
Contributor

BBC-Esq commented Mar 21, 2024

Nice to see some comradare.

@sanchit-gandhi sanchit-gandhi marked this pull request as ready for review March 21, 2024 17:32
@sanchit-gandhi
Copy link
Contributor Author

cc @trungkienbkhn - this one's ready for review!

@sanchit-gandhi sanchit-gandhi changed the title [WIP] Add support for distil-large-v3 Add support for distil-large-v3 Mar 21, 2024
@sanchit-gandhi
Copy link
Contributor Author

sanchit-gandhi commented Mar 21, 2024

Hey @trungkienbkhn! The model is now live under https://huggingface.co/distil-whisper/distil-large-v3-ct2

Feel free to merge this PR at your convenience to enable Faster-Whisper support!

Also cc @nguyendc-systran @metame-none @Purfview

@trungkienbkhn
Copy link
Collaborator

trungkienbkhn commented Mar 22, 2024

@sanchit-gandhi , thanks for your contribution.
After testing with the example provided, I found that the transcription time for the fw-distil-large-v3 model has been significantly reduced while maintaining almost the same quality compared with fw-large-v3 model.

However, when I added the option word_timestamps=True, I encountered an error:

segments, info = model.transcribe("audio.mp3", condition_on_previous_text=False, language="en", word_timestamps=True)
Processing segment at 00:00.000
[1]    7829 segmentation fault (core dumped)

This is caused by wrong alignment_heads field (a list of [layer, head]) in the config.json file of fw-distil-large-v3 model.
Since the distill-whisper only has 2 decode layers, fw is unable to extract word-level timestamps from layer index > 1.
Screenshot from 2024-03-22 14-24-42

So we should use the default alignment_heads for fw-distil-large-v3 model, same as fw-distil-large-v2 model. For more details on this logic, you can refer to the implementation in ctranslate2.
Could you re-verify and update this field? Tks.

@Purfview
Copy link
Contributor

@trungkienbkhn
Could you make usual conversion to float16 and place it at https://huggingface.co/Systran/faster-distil-whisper-large-v3

@BBC-Esq
Copy link
Contributor

BBC-Esq commented Mar 22, 2024

Oh yeah, almost forgot, wouldn't the be necessary because the faster-whisper library automatically points to the Systran repository unless another one is explicitly specified?

@Vaibhavs10
Copy link

Could you make usual conversion to float16 and place it at https://huggingface.co/Systran/faster-distil-whisper-large-v3

Brilliant idea! That way all the faster-whisper checkpoints remain in one org.
We can remove the preliminary ct2 checkpoint in distil-whisper org and point the README and documentation to SYSTRAN.

@sanchit-gandhi
Copy link
Contributor Author

Alignment heads updated according to those in distil-large-v2: https://huggingface.co/distil-whisper/distil-large-v3-ct2/discussions/1

Note that we'd get a better alignment by inspecting the alignment from the DTW algorithm, e.g. as done here.

However, the heuristic to use the last half layers for alignment by default should suffice for a first version.

@trungkienbkhn
Copy link
Collaborator

Could you make usual conversion to float16 and place it at https://huggingface.co/Systran/faster-distil-whisper-large-v3

FYI, we have released a new ct2 conversion model (using float16) for distil-large-v3: https://huggingface.co/Systran/faster-distil-whisper-large-v3

@sanchit-gandhi
Copy link
Contributor Author

Awesome, thanks @trungkienbkhn. I've updated this PR to use these fp16 weights.

@sanchit-gandhi
Copy link
Contributor Author

Feel free to merge this PR at your convenience - it would be awesome to unblock faster-whisper for the distil-whisper community.

@nguyendc-systran nguyendc-systran merged commit a67e0e4 into SYSTRAN:master Mar 26, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants