v1.3.0
Release Notes
- Support for TensorRT-LLM Backend
- Inclusion of Example Notebooks
TensorRT-LLM Backend
WhisperS2T now offers compatibility with NVIDIA's TensorRT-LLM: https://github.com/NVIDIA/TensorRT-LLM backend, delivering a further twofold improvement in inference time compared to the CTranslate2 backend. The current optimal configuration on an A30 GPU achieves transcription of 1-hour files in approximately 18 seconds. Updated benchmarks are detailed below: