🚀 Introducing Whipser-AT, a new joint audio tagging and speech recognition model. #1504
Replies: 2 comments 2 replies
-
Thanks @YuanGongND . Is it possible to get the audio tags already when using the tiny model? Do you expect any differences in results of the audio tag between large and tiny models, say for Japanese language? |
Beta Was this translation helpful? Give feedback.
-
hi @dgoryeo , Thanks so much for your interest! Yes, Regarding the performance (see column
You can see that smaller models also have a weaker Audio Tagging performance, but the performance is still reasonably good! Practically, you can use the following code for a quick try:
and then import whisper_at as whisper
audio_tagging_time_resolution = 10
model = whisper.load_model("tiny")
result = model.transcribe("audio.mp3", at_time_res=audio_tagging_time_resolution)
# ASR Results
print(result["text"])
# Audio Tagging Results
audio_tag_result = whisper.parse_at_label(result, language='follow_asr', top_k=5, p_threshold=-1, include_class_list=list(range(527)))
print(audio_tag_result) P.S. For more details, please check [Colab Demo] and our [Github Repo]. Cheers, |
Beta Was this translation helpful? Give feedback.
-
[Paper]
[HuggingFace Space] (Try Whisper-AT without Coding!)
[Colab Demo]
[Source Code]
We are glad to introduce Whisper-AT - A new joint audio tagging and speech recognition model. It outputs background sound labels in addition to text.
Key features:
Whisper-AT
inherits all APIs of Whisper, as well as its ASR performance. You only need to change your code minimally and can get the same output as the original Whisper.Whisper-AT
outputs audio event tasks of 527 classes (AudioSet ontology), at your desired time resolution. The audio tagging performance is close to SOTA standalone audio tagging model.Whisper-AT
.Whisper-AT
(e.g., set the threshold, and classes of interest) easily. Multi-lingual supported, audio taggings follow ASR language by default.See the demo (Please turn on the audio to listen to the sounds):
cooking.mp4
Code and pretrained model are released at [here]. Please have a try and let us know what you think!
Beta Was this translation helpful? Give feedback.
All reactions