Self-Modifying State Modeling for Simultaneous Machine Translation

Source Code for ACL 2024 main conference paper "Self-Modifying State Modeling for Simultaneous Machine Translation".

Our model is implemented based on the open-source toolkit Fairseq and the open-source code ITST.

Requirements and Installation

Python >= 3.7.10
torch >= 1.13.0
sacrebleu = 1.5.0

Install the Fairseq with the following commands:

git clone https://github.com/EurekaForNLP/SM2.git
cd SM2
pip install --editable ./

Quick Start

Data Processing

We tokenize the English, German, Romanian corpus with mosesdecoder/scripts/tokenizer/tokenizer.perl and Chinese corpus with fxsjy/jieba.
We apply BPE with rsennrich/subword-nmt.
We preprocess the data into fairseq format with preprocess.sh, adding --joined-dictionary for German-English.

Training

Use train_sm2.sh to finish Training SM$^2$. It is noted that:

The --arch transformer_with_sm2_unidirectional for SM$^2$ with unidirectional encoder settings.
If the used device supports bf16, --bf16 is suggested.
If source and target language share embeddings, use --share-all-embeddings.

Inference

Use test_sm2.sh to finish the inference process of simultaneous translation with --batch-size=1 and --beam=1

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
examples		examples
fairseq		fairseq
fairseq_cli		fairseq_cli
scripts		scripts
tests		tests
LICENSE		LICENSE
README.md		README.md
average_checkpoints.py		average_checkpoints.py
generate.py		generate.py
hubconf.py		hubconf.py
interactive.py		interactive.py
preprocess.py		preprocess.py
preprocess.sh		preprocess.sh
preprocess_test.sh		preprocess_test.sh
setup.py		setup.py
sim_generate.py		sim_generate.py
sm2framework.png		sm2framework.png
test_sm2.sh		test_sm2.sh
train.py		train.py
train_sm2.sh		train_sm2.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Modifying State Modeling for Simultaneous Machine Translation

Requirements and Installation

Quick Start

Data Processing

Training

Inference

About

Releases

Packages

Languages

License

EurekaForNLP/SM2

Folders and files

Latest commit

History

Repository files navigation

Self-Modifying State Modeling for Simultaneous Machine Translation

Requirements and Installation

Quick Start

Data Processing

Training

Inference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages