Add hotwords #11

FieldsMedal · 2023-05-08T03:18:35Z

No description provided.

yuekaizhang · 2023-05-08T03:49:26Z

swig/test/test_zh.py

+    scorer = Scorer(0.5, 0.5, lm_path, vocab_list)
+    batchhotwords = BatchHotwords()
+    #In the first badcase, there is a big difference in scoring between the optimal path and other paths.
+    hot_words = {'极点': 5, '换一': -3.40282e+38, '首歌': -100, '换歌': 3.40282e+38}


Could you explain more why setting these words here? Why do you use positive score or negtive score? What's the bad case? Results before and after setting it?

In test_zh.py，batch_log_ctc_probs contains two audio test data.
1）first audio,decoding without hotwords result is ('换一首歌', '', '换一', '一', '换', '换首歌', '一首歌', '一歌', '首歌', '歌'), i want to use hotwords chang the decoding result, let '换歌' ranking frontmost, so i set '换歌' positive weights 3.40282e+38 and reduce the weight of other words like "换一" and "首歌".The decoding result with hot words is ('换歌', '换一首歌', '', '换一', '一', '换', '一首歌', '一歌', '首歌', '歌'）；
2）second audio, decoding without hotwords result is ('几点了', '极点了', '几点啊', '几点啦', '几点儿', '几点吧', '急点了', '几点晚', '极点啊', '极点啦'). After setting '极点': 5，the decoding result is （'极点了', '几点了', '极点啊', '极点啦', '极点儿', '极点吧', '极点晚', '极点呀', '几点啊', '几点啦').

FieldsMedal · 2023-05-08T08:05:11Z

We tested hotwords on speech_asr_aishell1_hotwords_testsets.

Acoustic model: a small Conformer model for AIShell
Hotwords weight：hotwords.tar.gz
Test method: please refer to the readme of this repository(TODO)

Latency And CER

model (FP16)	Latency (s)	CER
offline model	5.5921	13.85
offline model with hotwords	5.6401	12.16

offline model: https://github.com/wenet-e2e/wenet/tree/main/runtime/gpu/model_repo

offline model with hotwords(TODO):

Decoding result

Label	hotwords	pred w/o hotwords	pred w/ hotwords
以及拥有陈露的女单项目	陈露	以及拥有陈鹭的女单项目	以及拥有陈露的女单项目
庞清和佟健终于可以放心地考虑退役的事情了	庞清佟健	庞青和董建终于可以放心地考虑退役的事情了	庞清和佟健终于可以放心地考虑退役的事情了
赵继宏老板电器做厨电已经三十多年了	赵继宏	赵继红老板电器做厨店已经三十多年了	赵继宏老板电器做厨电已经三十多年了

yuekaizhang · 2023-05-08T08:40:31Z

Thanks. Would you mind uploading the decoding results w w/o hotwords somewhere? (Maybe a huggingface repo for hotwords weight, ngram file, decoding results and other essentials is a good choice.)

Also, https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary here, they use F1 score, recall, precision to evaluate hot words. Can we get this stats?

I am also interested about the general test set performance. Would you mind testing the normal aishell test set WER?
https://github.com/yuekaizhang/Triton-ASR-Client/blob/main/client.py#L37-L43

FieldsMedal · 2023-05-09T06:38:55Z

Thanks. Would you mind uploading the decoding results w w/o hotwords somewhere? (Maybe a huggingface repo for hotwords weight, ngram file, decoding results and other essentials is a good choice.)

hotwords weight and ngram file:https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/tree/main/models.
decoding results: https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/tree/main/results.

The current order of ngram is 4, only support length <= 4 hotwords, if you want to configure longer hotwords, you can use higher order ngram, but at the same time will increase the decoding time.

Also, https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary here, they use F1 score, recall, precision to evaluate hot words. Can we get this stats?

Hotwords result on speech_asr_aishell1_hotwords_testsets.

model (FP16)	Latency (s)	CER	Recall	Precision	F1-score
offline model w/o hotwords	5.5921	13.85	0.27	0.99	0.43
offline model w/ hotwords	5.6401	12.16	0.45	0.97	0.62

I am also interested about the general test set performance. Would you mind testing the normal aishell test set WER? https://github.com/yuekaizhang/Triton-ASR-Client/blob/main/client.py#L37-L43

Hotwords result on AISHELL-1 Test dataset

model (FP16)	RTF	CER
offline model w/o hotwords	0.00437	4.6805
offline model w/ hotwords	0.00435	4.5831
streaming model w/o hotwords	0.01231	5.2777
streaming model w/ hotwords	0.01142	5.1926

Tested ENV

CPU：40 Core, Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
GPU：NVIDIA GeForce RTX 2080 Ti

yuekaizhang · 2023-05-09T11:15:49Z

Thanks. Would you mind uploading the decoding results w w/o hotwords somewhere? (Maybe a huggingface repo for hotwords weight, ngram file, decoding results and other essentials is a good choice.)

hotwords weight and ngram file:https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/tree/main/models. decoding results: https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/tree/main/results.

The current order of ngram is 4, only support length <= 4 hotwords, if you want to configure longer hotwords, you can use higher order ngram, but at the same time will increase the decoding time.

Also, https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary here, they use F1 score, recall, precision to evaluate hot words. Can we get this stats?

Hotwords result on speech_asr_aishell1_hotwords_testsets.

model (FP16) Latency (s) CER Recall Precision F1-score
offline model w/o hotwords 5.5921 13.85 0.27 0.99 0.43
offline model w/ hotwords 5.6401 12.16 0.45 0.97 0.62

I am also interested about the general test set performance. Would you mind testing the normal aishell test set WER? https://github.com/yuekaizhang/Triton-ASR-Client/blob/main/client.py#L37-L43

Hotwords result on AISHELL-1 Test dataset

model (FP16) RTF CER
offline model w/o hotwords 0.00437 4.6805
offline model w/ hotwords 0.00435 4.5831
streaming model w/o hotwords 0.01231 5.2777
streaming model w/ hotwords 0.01142 5.1926

Tested ENV

CPU：40 Core, Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz

GPU：NVIDIA GeForce RTX 2080 Ti

Many thanks. The results look nice. I was wondering for both w/ or w/o hotwords case if we use this as the default external LM https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/blob/main/models/init_kenlm.arpa.

Also, is the pretrained model from here https://github.com/wenet-e2e/wenet/tree/main/examples/aishell/s0#u2-conformer-result ? Looks like WER with WFST Decoding + attention rescoring for offline and chunk16 are 4.4 & 4.75. Pure attention rescoring without any ngram are 4.63&5.05. Not sure the results look like if you use aishell train set as arpa. I thought they use this 3-gram arpa https://huggingface.co/yuekai/aishell1_tlg_essentials/blob/main/3-gram.unpruned.arpa here.

FieldsMedal · 2023-05-09T12:44:53Z

Many thanks. The results look nice. I was wondering for both w/ or w/o hotwords case if we use this as the default external LM https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/blob/main/models/init_kenlm.arpa.

Also, is the pretrained model from here https://github.com/wenet-e2e/wenet/tree/main/examples/aishell/s0#u2-conformer-result ? Looks like WER with WFST Decoding + attention rescoring for offline and chunk16 are 4.4 & 4.75. Pure attention rescoring without any ngram are 4.63&5.05. Not sure the results look like if you use aishell train set as arpa. I thought they use this 3-gram arpa https://huggingface.co/yuekai/aishell1_tlg_essentials/blob/main/3-gram.unpruned.arpa here.

init_kenlm.arpa is used to init Score, because our hotwordsboosting depends on Scorer::make_ngram, any language model trained by Kenlm is fine. if decode with hotwords, put unique hot words from each recording into batchhotwords. if decode without hotwords, put None into batchhotwords. whether to add hotwords score controled by this. if user also wants to add ngram score, set use_ngram_score to true.
Our pretrained model from here https://github.com/wenet-e2e/wenet/blob/main/docs/pretrained_models.md, trained by aishell datasets. Our results on AISHELL-1 Test dataset were tested using the fp16 onnx model,ctc_weight is 0.3, reverse_weight is 0.3. These settings may have some impact.

yuekaizhang · 2023-05-09T13:06:28Z

swig/ctc_beam_search_decoder.cpp

+                    for (size_t index=0; index < ngram.size(); index ++ ) {
+                        std::string word = "";
+                        // character-based language model, combining chinese characters into words
+                        if (ext_scorer->is_character_based()) {


I am not fully understand why we need an external arpa file to launch the hotword scorer when we set use_ngram_score=False. If the ext_scorer->is_character_based() is the reason, how about setting an independent flag?

First, consider how to add ngram score when decoding at frame t. 1)、The first step is to get the words or Chinese characters in the fixed window(the fixed window is the language model order size).if use character-based language model, then find the Chinese characters of step t, step t-1, step t-2, step t-3, and we get [a, b, c, d].if use word-level language model, then find the words of step t, step t-1, step t-2, step t-3, and we get [words1, words2, words3 , words4]. This step is implemented by make_ngram .2)、Next step is to compute conditional probabilities p(d|(a,b,c)) or p((words4|words1,words2,words3)).This step is implemented by get_log_cond_prob.

Second, when do hotwordboosting at frame t. 1)、The first step is to find the characters of time steps such as t, t-1 and t-2, t-3, t-4...，this step can be obtained directly using make_ngram，so when do hotwordboosting we need external arpa file to launch ext_scorer. when ext_scorer is character-based language model, combine Chinese characters into words[a, b, c, d] -> [abcd, bcd, cd]. when ext_scorer is word-level language model, we get [words1, words2, words3 , words4]. if this words in hotwords dictionary then add hotwords score.

when we set use_ngram_score=False, conditional probability scores are not calculated. external arpa file is to launch ext_scorer, hotwordboosting rely on make_ngram. We can also write a function similar to the make_ngram function in hotwords.cpp, but our hotwords are directly using the already implemented make_ngram.

I see. So, if we use a 4-gram-arpa as ext_scorer, we can't handle hot words more than 4 like this one https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/blob/main/models/hotwords.yaml#L4, right?

What do you think if we copy that make_ngram into HotWord Class? Such that we could set the hot word ngram max_length as well as is_character_or_not.

Since it would be a little bit confusing, when we set the lm_path, but our final decoding results didn't use ngram lm.

If 4-gram-arpa as ext_scorer，only support a maximum length of 4 Chinese characters.

I plan to copy make_ngram into HotWordsBoosting Class, and exposing the length parameter of hotwords.

In the latest commit, we separated hotwords scoring from language model scoring. The latest hotwords results on speech_asr_aishell1_hotwords_testsets and AISHELL-1 Test dataset can be found here.

FieldsMedal · 2023-05-17T15:05:46Z

In the latest commit, we modify batch_hotwords_scorer to hotwords_scorer. If you have free time, please help to review this pr.

Slyne · 2023-05-18T03:45:42Z

@FieldsMedal Thanks!
One more question:
The output (Test hotwords boosting with word-level language models during ctc prefix beam search) for test_zh.py is

INFO:root:Test hotwords boosting with word-level language models during ctc prefix beam search
INFO:root:('', '一', '换', '一首', '极点晚', '几点啦', '极点', '几点', '', '几', '晚', '极')

Not sure if the above result is expected?

=================
Update:
Should be fine. It is the user's responsibility to ensure the vocabulary contains space_id.

Slyne · 2023-05-18T03:49:52Z

Thanks again!
Really great feature! @FieldsMedal

Add hotwords

9c6f664

yuekaizhang reviewed May 8, 2023

View reviewed changes

yuekaizhang reviewed May 9, 2023

View reviewed changes

杨焦 added 3 commits May 12, 2023 17:55

Separate hotwords score from ngram score

7a000ef

Modify hotwords usage

f6d78ab

Modify batch_hotwords_scorer to hotwords_scorer

a505681

Slyne merged commit 03259fd into Slyne:master May 18, 2023

zwglory mentioned this pull request May 19, 2023

[runtime/gpu] Add GPU Hotwords wenet-e2e/wenet#1860

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hotwords #11

Add hotwords #11

FieldsMedal commented May 8, 2023

yuekaizhang May 8, 2023

FieldsMedal May 8, 2023 •

edited

Loading

FieldsMedal commented May 8, 2023 •

edited

Loading

yuekaizhang commented May 8, 2023 •

edited

Loading

FieldsMedal commented May 9, 2023 •

edited

Loading

yuekaizhang commented May 9, 2023 •

edited

Loading

Tested ENV

FieldsMedal commented May 9, 2023 •

edited

Loading

yuekaizhang May 9, 2023 •

edited

Loading

FieldsMedal May 9, 2023

yuekaizhang May 10, 2023 •

edited

Loading

FieldsMedal May 10, 2023

FieldsMedal May 12, 2023

FieldsMedal commented May 17, 2023

Slyne commented May 18, 2023 •

edited

Loading

Slyne commented May 18, 2023

Add hotwords #11

Add hotwords #11

Conversation

FieldsMedal commented May 8, 2023

yuekaizhang May 8, 2023

Choose a reason for hiding this comment

FieldsMedal May 8, 2023 • edited Loading

Choose a reason for hiding this comment

FieldsMedal commented May 8, 2023 • edited Loading

Latency And CER

Decoding result

yuekaizhang commented May 8, 2023 • edited Loading

FieldsMedal commented May 9, 2023 • edited Loading

Tested ENV

yuekaizhang commented May 9, 2023 • edited Loading

Tested ENV

FieldsMedal commented May 9, 2023 • edited Loading

yuekaizhang May 9, 2023 • edited Loading

Choose a reason for hiding this comment

FieldsMedal May 9, 2023

Choose a reason for hiding this comment

yuekaizhang May 10, 2023 • edited Loading

Choose a reason for hiding this comment

FieldsMedal May 10, 2023

Choose a reason for hiding this comment

FieldsMedal May 12, 2023

Choose a reason for hiding this comment

FieldsMedal commented May 17, 2023

Slyne commented May 18, 2023 • edited Loading

Slyne commented May 18, 2023

FieldsMedal May 8, 2023 •

edited

Loading

FieldsMedal commented May 8, 2023 •

edited

Loading

yuekaizhang commented May 8, 2023 •

edited

Loading

FieldsMedal commented May 9, 2023 •

edited

Loading

yuekaizhang commented May 9, 2023 •

edited

Loading

FieldsMedal commented May 9, 2023 •

edited

Loading

yuekaizhang May 9, 2023 •

edited

Loading

yuekaizhang May 10, 2023 •

edited

Loading

Slyne commented May 18, 2023 •

edited

Loading