-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[translation] Incorrect Download Links #59
Comments
Experiencing a verification issue.
Here is the output of
|
I'm taking a look at this... will get back to you soon. |
Looks like the differences are the following files:
These are actually generated by preprocessing. Maybe the checksum should be changed to exclude these files. |
After running ./download_data.sh into raw_data/ I run the verify_dataset.sh script after it completes. I'm hitting a similar issue with using the verify_dataset.sh as the users above with the MD5 sum failing with OK: correct tensorflow/newstest2014.en Here's the output of find raw_data/ -type f -exec md5sum {} ; | sort -k 2 c8685312058e23cf08f46abe444a332f raw_data/commoncrawl.cs-en.annotation Not sure where I'm going wrong |
I got exactly the same error. I am using tf1.13.1, have you solved this problem? |
Closing because GNMT is deprecated from the benchmark suite. |
The following link cannot be accessed, and I cannot use it to download data. Who knows the reason ? https://raw.githubusercontent.com/tensorflow/mlbenchmark/master/transformer/test_data/newstest2014.en?token=ABCvMNAuy84bgb5dC-YxSxNyE6qvuit6ks5bGaqtwA%3D%3D |
We have identified a problem with the download links for translation. The current download script will download the incorrect test files (the files are derived from the correct files and have additional data in them which our preprocessor does not expect). This will result in the model training correctly, but the computed BLEU scores will be incorrect.
I am currently working on a new download links and a new verify_dataset.sh.
For the time being, people can download the correct files from:
https://raw.githubusercontent.com/tensorflow/mlbenchmark/master/transformer/test_data/newstest2014.en?token=ABCvMNAuy84bgb5dC-YxSxNyE6qvuit6ks5bGaqtwA%3D%3D
https://raw.githubusercontent.com/tensorflow/mlbenchmark/master/transformer/test_data/newstest2014.de?token=ABCvMIGPgU8sEiCTPR4D-yovJ-TooXv2ks5bGau_wA%3D%3D
md5sum:
06e8840abe90cbfbd45cf2729807605d newstest2014.de BAD
f6c3818b477e4a25cad68b61cc883c17 newstest2014.de GOOD
4e4663b8de25d19c5fc1c4dab8d61703 newstest2014.en BAD
dabf51a9c02b2235632f3cee75c72d49 newstest2014.en GOOD
The text was updated successfully, but these errors were encountered: