You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the PAWS-X train dataset for the German language. Upon analysing translated_train.tsv for German, I found 3,209 cases which consisted of identical sentence pairs. 84 of these 3,209 sentence pairs were tagged as non-paraphrases, the rest naturally as paraphrases (sentence pair indices attached in text file GER_duplicates.txt).
Assuming that those were generated by accident due to translation errors, I was surprised to find at least one identical sentence pair in the English train set (sentence pair ID 1288); there could also be more as I have not checked all.
Is this perhaps because of some bug?
The text was updated successfully, but these errors were encountered:
Hello,
I am using the PAWS-X train dataset for the German language. Upon analysing
translated_train.tsv
for German, I found 3,209 cases which consisted of identical sentence pairs. 84 of these 3,209 sentence pairs were tagged as non-paraphrases, the rest naturally as paraphrases (sentence pair indices attached in text file GER_duplicates.txt).Assuming that those were generated by accident due to translation errors, I was surprised to find at least one identical sentence pair in the English train set (sentence pair ID 1288); there could also be more as I have not checked all.
Is this perhaps because of some bug?
The text was updated successfully, but these errors were encountered: