Skip to content

Latest commit

 

History

History
20 lines (17 loc) · 1.33 KB

File metadata and controls

20 lines (17 loc) · 1.33 KB

Positioning-Wenzhounese-in-Chinese-Southern-dialects

Where does Wenzhounese stand lexically?
What makes a dialect dialect and different from Mandarin lexically?
What makes each dialect different from others lexically?

Data

  • File name is "cmn-yue-wuu-wen-parallel-simplified_final" and uploaded here.
  • The data is parallel sentences of Mandarin (cmn), Shanghainese (wuu) and Cantonese (yue) with each 140 sentences and is downloaded from Tateoba.
  • The 140 sentences of Wenzhounese (wen) are translated by myself from the Mandarin (cmn) sentences of Tateoba.
  • Data is run over chinese-converter to convert chinese traditional characters into their simplified correspondents. (except some of traditional cantonese characters are not converted.)
  • All punctuation are removed.
    • punc = ["。", ",", "!", "?", "'", """, ",", ".", "、", "?", "「", "」"]

Methods

Five different methods are chosen and applied.

Results

  • 1000 random sampling with replacement is employed for the result of every method to acquire a general distribution.
  • The distributions are all visualized by using kdeplot from seaborn.
  • The results of four methods show consistency (check each method for details).