Identical Sentence Pairs in German Train Set #10

JonathanSchaber · 2020-10-01T14:05:23Z

Hello,

I am using the PAWS-X train dataset for the German language. Upon analysing translated_train.tsv for German, I found 3,209 cases which consisted of identical sentence pairs. 84 of these 3,209 sentence pairs were tagged as non-paraphrases, the rest naturally as paraphrases (sentence pair indices attached in text file GER_duplicates.txt).

Assuming that those were generated by accident due to translation errors, I was surprised to find at least one identical sentence pair in the English train set (sentence pair ID 1288); there could also be more as I have not checked all.

Is this perhaps because of some bug?

The text was updated successfully, but these errors were encountered:

PhilipMay · 2021-09-13T17:45:23Z

Nice one. Thanks for reporting @JonathanSchaber

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identical Sentence Pairs in German Train Set #10

Identical Sentence Pairs in German Train Set #10

JonathanSchaber commented Oct 1, 2020 •

edited

Loading

PhilipMay commented Sep 13, 2021

Identical Sentence Pairs in German Train Set #10

Identical Sentence Pairs in German Train Set #10

Comments

JonathanSchaber commented Oct 1, 2020 • edited Loading

PhilipMay commented Sep 13, 2021

JonathanSchaber commented Oct 1, 2020 •

edited

Loading