This repository has been archived by the owner on Jan 5, 2023. It is now read-only.
v1.1
v1.1 (25/01/2018)
- New experimental
Multi30kDataset
andImageFolderDataset
classes torchvision
dependency added for CNN supportnmtpy-coco-metrics
now computes one METEOR withoutnorm=True
- Mainloop mechanism is completely refactored with backward-incompatible
configuration option changes for[train]
section:patience_delta
option is removed- Added
eval_batch_size
to define batch size for GPU beam-search during training eval_freq
default is now3000
which means per3000
minibatcheseval_metrics
now defaults toloss
. As before, you can provide a list
of metrics likebleu,meteor,loss
to compute all of them and early-stop
based on the first- Added
eval_zero (default: False)
which tells to evaluate the model
once on dev set right before the training starts. Useful for sanity
checking if you fine-tune a model initialized with pre-trained weights - Removed
save_best_n
: we no longer save the bestN
models on dev set
w.r.t. early-stopping metric - Added
save_best_metrics (default: True)
which will save best models
on dev set w.r.t each metric provided ineval_metrics
. This kind of
remedies the removal ofsave_best_n
checkpoint_freq
now to defaults to5000
which means per5000
minibatches.- Added
n_checkpoints (default: 5)
to define the number of last
checkpoints that will be kept ifcheckpoint_freq > 0
i.e. checkpointing enabled
- Added
ExtendedInterpolation
support to configuration files:- You can now define intermediate variables in
.conf
files to avoid
typing same paths again and again. A variable can be referenced
from within its section usingtensorboard_dir: ${save_path}/tb
notation
Cross-section references are also possible:${data:root}
will be replaced
by the value of theroot
variable defined in the[data]
section.
- You can now define intermediate variables in
- Added
-p/--pretrained
tonmtpy train
to initialize the weights of
the model using another checkpoint.ckpt
. - Improved input/output handling for
nmtpy translate
:-s
accepts a comma-separated test sets defined in the configuration
file of the experiment to translate them at once. Example:-s val,newstest2016,newstest2017
- The mutually exclusive counterpart of
-s
is-S
which receives a
single input file of source sentences. - For both cases, an output prefix should now be provided with
-o
.
In the case of multiple test sets, the output prefix will be appended
the name of the test set and the beam size. If you just provide a single file with-S
the final output name will only reflect the beam size information.
- Two new arguments for
nmtpy-build-vocab
:-f
: Stores frequency counts as well inside the finaljson
vocabulary-x
: Does not add special markers<eos>,<bos>,<unk>,<pad>
into the vocabulary
Layers/Architectures
- Added
Fusion()
layer toconcat,sum,mul
an arbitrary number of inputs - Added experimental
ImageEncoder()
layer to seamlessly plug a VGG or ResNet
CNN usingtorchvision
pretrained models Attention
layer arguments improved. You can now select the bottleneck
dimensionality for MLP attention withatt_bottleneck
. Thedot
attention is still not tested and probably broken.
New stuff
- Added AttentiveMNMT which implements modality-specific multimodal attention
from the paper Multimodal Attention for Neural Machine Translation - Added ShowAttendAndTell model
Changes in NMT
dec_init
defaults tomean_ctx
, i.e. the decoder will be initialized
with the mean context computed from the source encoderenc_lnorm
which was just a placeholder is now removed since we do not
provided layer-normalization for now- Beam Search is completely moved to GPU