-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert the batched process updates for the future of the faster-whisper #940
Conversation
…rencePipeline` and fix word timestamps for batched inference (SYSTRAN#921)" This reverts commit d57c5b4.
…e. (SYSTRAN#923)" This reverts commit 83a368e.
…d Quality Enhancements (SYSTRAN#856)" This reverts commit eb83902.
I believe this is an utmost importance. @guillaumekln We need to keep faster-whisper elegant and agile to make it more usable and maintainable. Otherwise, it will be transformed into something non-maintainable, and non-usable, such as whisper-x(some of the batched inferencing authors are also contributors to that project). I will keep this PR alive with all the other changes on the master branch. |
While I agree with @ozancaglayan that your approach was suboptimal initially, I agree with this PR. Having spent the last 2 hours reversing the compatibility issues that this update caused. |
This is really an important aspect as community is not really aware of the effect of the changes. Community users will understand the effect of the change when they use the version. We can partition the whole PR into parts and make it adapted by the community by the time. That is really critical for this project to live. |
@trungkienbkhn |
Yes, Let's avoid faster-whisper to become bulky-whisper. There are many other healthy ways to use the contributions of the regarding PR without affecting the community or this repo;
|
Has anyone taken a look at considering the approach used by WhisperS2T? There seems to be some kind of unexplained loyalty and/or comradare between faster-whisper and WhisperX for some reason when there's a completely viable alternative. Thanks, my 2cents. |
It was already implemented in #921 |
I really don't like the current situation in this repository. Some contributors merged these changes to main, and there has been more and more negative concerns & comments about the disruptive nature of these changes. Then there's this total silence where the contributors are reluctant to revert those changes, the organization hosting this repository does not comment at all at various places that they were tagged. I don't know what's going on to be honest. |
@trungkienbkhn has greatly improved this repository. |
Thanks, I'll take a second look. My comment was based on a comment - perhaps in a different discussion as I don't see it here - about whether WhisperX's batch approach or WhisperS2T's should be used and WhisperX's was used...but I might have been mistaken. Consider my comment temporarily withdrawn... |
I don't think the people who maintains this repo do this as their primary job. I think they all have day jobs and work on this in their spare time. Unlike well-known repositories like |
That is unfortunate but right. I think if the organization did its work well, it would not allow such kind of big and bulky changes to the main repository making it really something different than the "faster-whisper". What is going to be is a big unknown until the organization steps up and listens to community feed backs. |
Manpower is important that is for sure. But spending effort and time does not always mean that the result will be better. As apparent in this scenario, bringing something this big and blocking the community from using faster whisper following its design philosophy, is something that will kill the project. @guillaumekln was right not to bring the batch processing faster-whisper, it is something that can be added as a separate project like whisper-x. We do need manpower but with design thinking and proper software engineering capabilities. |
@MahmoudAshraf97 Can you elaborate on what is particularly implemented in this PR as in the WhisperS2T context? As I see WhisperS2T performance comes from its TensorRT-LLM backend implementation. https://github.com/shashikg/WhisperS2T/releases/tag/v1.3.0 |
Whisper S2T contains multiple backends, including Ctranslate2 and TRT LLM |
I see, thanks for the info. So if the same logic is implemented in WhisperS2T, why not use that one for batched inferencing? It is not right to spend effort to do the same batching logic on multiple projects. I think this is another reason to make all these batched inferencing stuff reverted and created as a different project. |
Because neither WhisperS2T nor WhisperX are maintained any longer, it doesn't make sense to implement every new feature in a separate package, especially that it's an independent feature that honours the old usage with absolutely no change to the behaviour ( except for the requirements which are halfway done ), bugs that appear in issues are fixed within a week, and this is expected to have bugs on the master branch because this was not advertised as a stable release anywhere, you don't like the batching implementation? Don't use it or keep using 1.0.3 and if the next stable release has issues then report them to be fixed or even better, fix them yourself and open a PR. |
You will make the same to faster-whisper, make it something unmaintained by making it complex and dependent. Why don't you fork your project and maintain all this stuff in your repo? Why do you prefer to make the faster-whisper repo bulky and ugly designed? Another question, who will mark this version stable? how will it be done? Can you see/solve all the issues that the increased complexity injected through the community usage? Will you fix those? or will you abandon that part of the community? Due to the bulky design, can you be able to satisfy the demands of the community from the original faster-whisper ? Will you advise everyone to use 1.0.3 or forking their repo that lives through the issues/incompatibilities all the time? As I said, you are not the owner of this repository and spending time does not make you either. I will pursue this and do my best to return faster-whisper to its origins. It would help if you did not direct the community like this and I believe the organization should not allow you to do such irresponsible behaviour. |
I don't think that's entirely accurate @aligokalppeker. I mean, I respect your opinion and your right to express it, but I actually think that @MahmoudAshraf97 is doing a hell of a job, especially considering it'd be the biggest improvement to faster-whisper in a long time. IMHO, I personally am NOT a fan of 90k+ character long scripts like ...but I'll get into that later...for now...can someone/anyone summarize the current issues with the code - i.e., any and all issues? I'll go through and see if I can't contribute to a solution. If anyone has any sample scripts using the new API, and especially SIMPLE bench marking scripts I'd appreciate it. I use benchmarking scripts internally. I'm attaching one here that we can build off of. Also, @MahmoudAshraf97 you say that it's now "resembling" WhisperS2T? Can you explain a little more how your implementation works in this regard and the difference between "mirroring" WhisperS2T and "resembling" it? I'd like to start comparing...and possibly contributing. HERE'S THE SCRIPT SCRIPT HERE
|
@MahmoudAshraf97 Where does the benefit of batching comes from? I'm doing some benchmarking (only encoding of 30-sec chunks) for now. The below benchmark does not do any decoding etc. Below numbers are obtained from a Tesla T4 GPU. You can see that dividing the obtained times by batch size, consistently gives ~150ms. Any idea what am I missing? In [107]: feats.shape
Out[107]: (80, 19620)
In [108]: f30 = feats[None, :, :3000] # 30sec
In [110]: for bs in (1, 2, 4, 8):
...: # repeat the same data multiple times to simulate batching
...: batch = f30.repeat(bs, axis=0)[..., :3000]
...: print(batch.shape)
...: batch = get_ctranslate2_storage(batch).to_device(ct2.Device.cuda)
...: %timeit model.model.encode(batch)
...:
(1, 80, 3000)
149 ms ± 761 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
(2, 80, 3000)
287 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
(4, 80, 3000)
568 ms ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
(8, 80, 3000)
1.16 s ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
I agree on @aligokalppeker 's last points (although not the tone, @aligokalppeker no need to be angry, we can discuss calmly) Lots of repos appeared and disappeared in the past year(s) regarding Whisper ASR. Lots of them were later abandoned. Unless we have a clear plan and roadmap on how this repository will be maintained going forward, I don't see a benefit of merging a bulky PR and then not being co-operative given the amount of complaints from the community. In the original PR, @Purfview also wrote that the PR contains multiple unrelated stuff. Lots of people reacted with thumbs up including the contributors themselves (@MahmoudAshraf97 ) but this comment was not addressed. Why did we add another VAD model? No answer and no clear justifications, other than adding 100~s of new package dependencies, etc. etc. The whole thing about the PR was massive speed improvements, even claiming 100x speed ups which are unrealistic. Moreover, people running on CPUs experienced slowdowns as the number of CPU threads was defaulted to a non-portable value. I don't know why can't we just revert it and then properly review and merge it atomically. Nobody wants to block contributions here, we just want to ensure quality. Also, I don't understand who is the person approving the PR. They doesn't seem to be a contributor and/or maintainer of this repo. I don't understand how somebody who didn't contribute to this repo could approve that PR? ![]() Please we're not asking rocket science here:
Thanks! |
That was before PR was cleaned up. |
@BBC-Esq I guess it means the same thing, the main difference between both is how the process is handled and our implementation is now closer to WhisperS2T than to WhisperX, we can't exactly mirror WhisperS2T because it was designed with multiple backends in mind, while we only use one backend so our approach is simpler and adheres to the original design of faster whisper more. @aligokalppeker batch_size = 1
for i in range(0, len(features), batch_size): # (75,128,3000)
inputs = features[i : i + batch_size]
outputs = ct2_model.model.generate(
get_ctranslate2_storage(inputs),
prompts=[[50258, 50259, 50360, 50364]] * len(inputs),
)
torch.cuda.empty_cache() These are the results on a single T4, showing 2x speedup
I used a large batch size to demonsrate that batching will have increasing speedups until we are compute-bound, low batch sizes are mostly memory-bound We already addressed most of the problems regarding VAD and additional package dependencies in #936 where the number of dependencies is now almost back to the original number, we are almost there regarding the correct implementation, I know the approach could've been better but that already happened, how about we try to correct the approach from now on instead of starting all over again? |
Thanks @MahmoudAshraf97 so do you confirm that for encoding, batching wouldn't have much effect? |
It does have effect as you demonstrated, 5% speedup is considerable in cloud deployment scenarios where time is literally money, but YMMV depending on the hardware specifications as T4 is considered old now days, the largest effect is in the decoding, but that doesn't mean that the encoding doesn't benefit from it |
Thx for the comments @ozancaglayan and @BBC-Esq, I am furious about how this kind of PR is allowed to butcher the project despite all community feedback. Performance is important but cleanness, maintainability, and compatibility are more prioritized in software projects, especially for open-source ones. This PR does not achieve performance improvement for most of the scenarios (need specific hardware and data conditions) but makes code complex compared to the gains. I am looking at patch #936, and it is still far from what needs to be done for the cleanup. One big example is torch dependency still there and should be removed, there is no place for the bulky torch dependency if you do not have a training pipeline. Also, most of the complex transcription code remains in there with batching stuff and it is still bulky. The effect in the code is massive and everyone gets different performance benchmarks. One can not disregard old hardware, low hardware, etc. In the cloud, many different types of CPU/GPU configurations are utilized for power/efficiency considerations. @ozancaglayan I agree with your approach on the PR as I think similarly. Starting over is critical and necessary in this amount of change as if have not started development in a clean feature context (with a non-opinionated) with a proper modular design, it will be harder to achieve the clean status back again. |
We use |
We're removing the torch dependency already, it's just a matter of time, but the speedup I'm talking about here is due to batching which is not dependant on torch |
just share my story: after the batch PR my https://github.com/heimoshuiyu/whisper-fastapi docker image size grows from 5GB to 10GB (without the model file) Edit: and the vram usage increased from 4GB to 10GB (30 mins audio on 4060ti, only using the |
Concerning the issue; #937, I would suggest revert the related commits. The major PR is huge and can be separated to merge as independent PRs.
This is a serious concern and is important for the project's future.
Injected changes should be;
Even if the authors downvote this PR (I understand that it is harsh), let's revert the related commit, partition it to several sub-functional PRs, and merge them incrementally and timely fashion to enable the community to adapt iteratively.
Thanks, @trungkienbkhn