diff --git a/README.md b/README.md index a8ec950..62600d3 100644 --- a/README.md +++ b/README.md @@ -26,10 +26,11 @@ * `kXHC1983.txt`: [Unihan Database][unihan] 中 [kXHC1983](http://www.unicode.org/reports/tr38/#kXHC1983) 部分的拼音数据(来源于《现代汉语词典》的拼音数据) * `kHanyuPinlu.txt`: [Unihan Database][unihan] 中 [kHanyuPinlu](http://www.unicode.org/reports/tr38/#kHanyuPinlu) 部分的拼音数据(来源于《現代漢語頻率詞典》的拼音数据) * `kMandarin.txt`: [Unihan Database][unihan] 中 [kMandarin](http://www.unicode.org/reports/tr38/#kMandarin) 部分的拼音数据(普通话中最常用的一个读音。zh-CN 为主,如果 zh-CN 中没有则使用 zh-TW 中的拼音) -* `GBK_PUA.txt`: [Private Use Area](https://en.wikipedia.org/wiki/Private_Use_Areas) 中有拼音的汉字,参考 [GB 18030 - 维基百科,自由的百科全书](https://zh.wikipedia.org/wiki/GB_18030#PUA) -* `nonCJKUI.txt`: 不属于 [CJK Unified Ideograph](https://en.wikipedia.org/wiki/CJK_Unified_Ideographs) 但是却有拼音的字符 -* `overwrite.txt`: 手工纠正的拼音数据(**上面的拼音数据都是通过程序生成的,修改的话只修改这个就可以了**) +* `kMandarin_overwrite.txt`: 手工纠正 `kMandarin.txt` 中有误的拼音数据(**可以修改**) +* `GBK_PUA.txt`: [Private Use Area](https://en.wikipedia.org/wiki/Private_Use_Areas) 中有拼音的汉字,参考 [GB 18030 - 维基百科,自由的百科全书](https://zh.wikipedia.org/wiki/GB_18030#PUA) (**可以修改**) +* `nonCJKUI.txt`: 不属于 [CJK Unified Ideograph](https://en.wikipedia.org/wiki/CJK_Unified_Ideographs) 但是却有拼音的字符(**可以修改**) * `kMandarin_8105.txt`: [《通用规范汉字表》](https://zh.wikipedia.org/wiki/通用规范汉字表)(2013 年版)里 8105 个汉字最常用的一个读音 (**可以修改**) +* `overwrite.txt`: 手工纠正的拼音数据(**可以修改**) * `pinyin.txt`: 合并上述文件后的拼音数据 * `zdic.txt`: [汉典网](http://zdic.net) 的拼音数据 diff --git a/kMandarin_overwrite.txt b/kMandarin_overwrite.txt new file mode 100644 index 0000000..4c0d9bc --- /dev/null +++ b/kMandarin_overwrite.txt @@ -0,0 +1,64 @@ +U+389C: kāng # 㢜 +U+60B7: lì # 悷 +U+417F: huá # 䅿 +U+46BE: rén # 䚾 +U+4B78: fù # 䭸 +U+4B7B: fēn # 䭻 +U+4CC9: dōng # 䳉 +U+4D7B: huì # 䵻 +U+57D4: pǔ # 埔 +U+5A47: cǎi # 婇 +U+5F6F: piāo # 彯 +U+5F77: páng # 彷 +U+60B7: lì # 悷 +U+65FD: tūn # 旽 +U+6A0B: tōng # 樋 +U+6ADA: lǘ # 櫚 +U+6E5E: zhēn # 湞 +U+73D6: guāng # 珖 +U+77A1: guī # 瞡 +U+7BC9: zhù # 築 +U+815C: méi # 腜 +U+816C: róu # 腬 +U+8192: ōu # 膒 +U+8491: yīn # 蒑 +U+8A09: fàn # 訉 +U+90D8: lǚ # 郘 +U+9D24: zhōng # 鴤 +U+2031A: nòng # 𠌚 +U+2141D: fú # 𡐝 +U+21594: nuó # 𡖔 +U+2199D: xiāo # 𡦝 +U+21B0D: mí # 𡬍 +U+21B10: yí # 𡬐 +U+21B15: lóng # 𡬕 +U+2243F: rǎng # 𢐿 +U+2273D: kuí # 𢜽 +U+22741: hōng # 𢝁 +U+22892: sū # 𢢒 +U+22A10: jí # 𢨐 +U+245ED: xià # 𤗭 +U+24704: huái # 𤜄 +U+247AE: zhài # 𤞮 +U+24856: yán # 𤡖 +U+248B5: lài # 𤢵 +U+249EB: jīn # 𤧫 +U+2546B: kān # 𥑫 +U+2588D: hù # 𥢍 +U+2588F: diàn # 𥢏 +U+25C1F: yuán # 𥰟 +U+272D5: kùn # 𧋕 +U+2757A: shuāng # 𧕺 +U+275C8: nú # 𧗈 +U+27956: lí # 𧥖 +U+280A2: jí # 𨂢 +U+2824B: tuō # 𨉋 +U+284A8: hài # 𨒨 +U+28ABF: liú # 𨪿 +U+28DED: chán # 𨷭 +U+28E30: jú # 𨸰 +U+293CF: wéi # 𩏏 +U+295F5: zhēng # 𩗵 +U+29B5D: wǒ # 𩭝 +U+2A048: zhuāng # 𪁈 +U+2A2A2: shí # 𪊢 diff --git a/merge_unihan.py b/merge_unihan.py index ab011fc..0f75a58 100644 --- a/merge_unihan.py +++ b/merge_unihan.py @@ -77,6 +77,10 @@ def extend_pinyins(old_map, new_map, only_no_exists=False): with open('kMandarin_8105.txt') as fp: adjust_pinyin_map = parse_pinyins(fp) extend_pinyins(raw_pinyin_map, adjust_pinyin_map) + with open('kMandarin_overwrite.txt') as fp: + _map = parse_pinyins(fp) + extend_pinyins(adjust_pinyin_map, _map) + extend_pinyins(raw_pinyin_map, adjust_pinyin_map) with open('kMandarin.txt') as fp: _map = parse_pinyins(fp) extend_pinyins(adjust_pinyin_map, _map)