Teacher and student models for Estonian.
newstest2018, cased BLEU
system | et-en | en-et | comment |
---|---|---|---|
NICT WMT18 | 30.7 | 25.2 | SMT+NMT |
Tilde WMT18 | 29.46 | 23.57 | |
Edinburgh WMT18 | 29.4 | 22.7 | Ens. of RNNs and transformers |
transformer-big fine-tuned #1 | 34.4 | 27.2 | |
transformer-big fine-tuned #2 | 34.5 | 27.0 | |
+ ensemble x2 (teacher) | 34.7 | 27.5 | WMT18 constrained system |
Notes:
- BLEU scores for Estonian newstest2018 from sacreBLEU: BLEU+case.mixed+lang.en-et+numrefs.1+smooth.exp+test.wmt18+tok.13a+version.1.4.2
- A transformer-big model has been trained on 80M/100M back-translations (en-et/et-en respectively), then fine-tuned on 1M cleaned WMT18 parallel data. It's a constrained WMT18 system.
system | size (MB) | wmt18 (BLEU) | speed CPU (sec) | speed GPU |
---|---|---|---|---|
teacher ensemble x2, beam 4 | 2x 798MB | 34.7 | -- | 110s |
student tiny11, beam 1 | 65MB | 31.8 | 18s | 2.3s |
student tiny11, beam 1, packed8avx512 | 46MB | 31.4 | 13s | -- |
student tiny11, beam 1, intgemm8 | 17MB | 31.0 | 12s | -- |
student tiny11, beam 1, intgemm8alphas | 17MB | 30.8 | 11s | -- |
system | size (MB) | wmt18 (BLEU) | speed CPU (sec) | speed GPU |
---|---|---|---|---|
teacher ensemble x2, beam 4 | 2x 798MB | 27.5 | -- | 116s |
student tiny11, beam 1 | 65MB | 25.7 | 18s | 3.0s |
student tiny11, beam 1, packed8avx512 | 46MB | 25.5 | 14s | -- |
student tiny11, beam 1, intgemm8 | 17MB | 25.4 | 12s | -- |
student tiny11, beam 1, intgemm8alphas | 17MB | 25.1 | 11s | -- |
Notes:
- Students are tiny transformers: 256 emb., 1536 FFN, 6-layer encoder, 2-layer decoder (not tied) with SSRU units (tiny11).
- Trained on teacher-generated parallel data, back- and forward-translations, with guided alignments.
- Evaluated on newstest2018, which consists of 2,000 sentences (ca. 40k English tokens and 30k Estonian tokens).
- Tested with marian-dev v1.8.40 compiled with FBGEMM (on elli):
- GPU: GeForce GTX 1080 Ti, mini-batch 64, beam size 1
- Tested with marian-dev branch intgemm-reintegrated-computestats compiled with FBGEMM (on var):
- CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz (avx512vnni), single thread, mini-batch 32, beam size 1, lexical shortlist