-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hotwords #11
Add hotwords #11
Conversation
swig/test/test_zh.py
Outdated
scorer = Scorer(0.5, 0.5, lm_path, vocab_list) | ||
batchhotwords = BatchHotwords() | ||
#In the first badcase, there is a big difference in scoring between the optimal path and other paths. | ||
hot_words = {'极点': 5, '换一': -3.40282e+38, '首歌': -100, '换歌': 3.40282e+38} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain more why setting these words here? Why do you use positive score or negtive score? What's the bad case? Results before and after setting it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In test_zh.py,batch_log_ctc_probs contains two audio test data.
1)first audio,decoding without hotwords result is ('换一首歌', '', '换一', '一', '换', '换首歌', '一首歌', '一歌', '首歌', '歌'), i want to use hotwords chang the decoding result, let '换歌' ranking frontmost, so i set '换歌' positive weights 3.40282e+38 and reduce the weight of other words like "换一" and "首歌".The decoding result with hot words is ('换歌', '换一首歌', '', '换一', '一', '换', '一首歌', '一歌', '首歌', '歌');
2)second audio, decoding without hotwords result is ('几点了', '极点了', '几点啊', '几点啦', '几点儿', '几点吧', '急点了', '几点晚', '极点啊', '极点啦'). After setting '极点': 5,the decoding result is ('极点了', '几点了', '极点啊', '极点啦', '极点儿', '极点吧', '极点晚', '极点呀', '几点啊', '几点啦').
We tested hotwords on speech_asr_aishell1_hotwords_testsets.
Latency And CER
offline model: https://github.com/wenet-e2e/wenet/tree/main/runtime/gpu/model_repo offline model with hotwords(TODO): Decoding result
|
Thanks. Would you mind uploading the decoding results w w/o hotwords somewhere? (Maybe a huggingface repo for hotwords weight, ngram file, decoding results and other essentials is a good choice.) Also, https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary here, they use F1 score, recall, precision to evaluate hot words. Can we get this stats? I am also interested about the general test set performance. Would you mind testing the normal aishell test set WER? |
hotwords weight and ngram file:https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/tree/main/models.
Hotwords result on speech_asr_aishell1_hotwords_testsets.
Hotwords result on AISHELL-1 Test dataset
Tested ENV
|
Many thanks. The results look nice. I was wondering for both w/ or w/o hotwords case if we use this as the default external LM https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/blob/main/models/init_kenlm.arpa. Also, is the pretrained model from here https://github.com/wenet-e2e/wenet/tree/main/examples/aishell/s0#u2-conformer-result ? Looks like WER with WFST Decoding + attention rescoring for offline and chunk16 are 4.4 & 4.75. Pure attention rescoring without any ngram are 4.63&5.05. Not sure the results look like if you use aishell train set as arpa. I thought they use this 3-gram arpa https://huggingface.co/yuekai/aishell1_tlg_essentials/blob/main/3-gram.unpruned.arpa here. |
|
swig/ctc_beam_search_decoder.cpp
Outdated
for (size_t index=0; index < ngram.size(); index ++ ) { | ||
std::string word = ""; | ||
// character-based language model, combining chinese characters into words | ||
if (ext_scorer->is_character_based()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not fully understand why we need an external arpa file to launch the hotword scorer when we set use_ngram_score=False. If the ext_scorer->is_character_based() is the reason, how about setting an independent flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- First, consider how to add ngram score when decoding at frame t. 1)、The first step is to get the words or Chinese characters in the fixed window(the fixed window is the language model order size).if use character-based language model, then find the Chinese characters of step t, step t-1, step t-2, step t-3, and we get [a, b, c, d].if use word-level language model, then find the words of step t, step t-1, step t-2, step t-3, and we get [words1, words2, words3 , words4]. This step is implemented by make_ngram .2)、Next step is to compute conditional probabilities p(d|(a,b,c)) or p((words4|words1,words2,words3)).This step is implemented by get_log_cond_prob.
- Second, when do hotwordboosting at frame t. 1)、The first step is to find the characters of time steps such as t, t-1 and t-2, t-3, t-4...,this step can be obtained directly using make_ngram,so when do hotwordboosting we need external arpa file to launch ext_scorer. when ext_scorer is character-based language model, combine Chinese characters into words[a, b, c, d] -> [abcd, bcd, cd]. when ext_scorer is word-level language model, we get [words1, words2, words3 , words4]. if this words in hotwords dictionary then add hotwords score.
- when we set use_ngram_score=False, conditional probability scores are not calculated. external arpa file is to launch ext_scorer, hotwordboosting rely on make_ngram. We can also write a function similar to the make_ngram function in hotwords.cpp, but our hotwords are directly using the already implemented make_ngram.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So, if we use a 4-gram-arpa as ext_scorer, we can't handle hot words more than 4 like this one https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/blob/main/models/hotwords.yaml#L4, right?
What do you think if we copy that make_ngram into HotWord Class? Such that we could set the hot word ngram max_length as well as is_character_or_not.
Since it would be a little bit confusing, when we set the lm_path, but our final decoding results didn't use ngram lm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If 4-gram-arpa as ext_scorer,only support a maximum length of 4 Chinese characters.
- I plan to copy make_ngram into HotWordsBoosting Class, and exposing the length parameter of hotwords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the latest commit, we separated hotwords scoring from language model scoring. The latest hotwords results on speech_asr_aishell1_hotwords_testsets and AISHELL-1 Test dataset can be found here.
In the latest commit, we modify batch_hotwords_scorer to hotwords_scorer. If you have free time, please help to review this pr. |
@FieldsMedal Thanks!
Not sure if the above result is expected? ================= |
Thanks again! |
No description provided.