Skip to content

Commit

Permalink
Merge pull request #21 from mozillazg/kMandarin_overwrite.txt
Browse files Browse the repository at this point in the history
新增 kMandarin_overwrite.txt
  • Loading branch information
mozillazg authored Mar 18, 2018
2 parents 41c3d03 + a77f572 commit 99dcb07
Show file tree
Hide file tree
Showing 3 changed files with 72 additions and 3 deletions.
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,11 @@
* `kXHC1983.txt`: [Unihan Database][unihan][kXHC1983](http://www.unicode.org/reports/tr38/#kXHC1983) 部分的拼音数据(来源于《现代汉语词典》的拼音数据)
* `kHanyuPinlu.txt`: [Unihan Database][unihan][kHanyuPinlu](http://www.unicode.org/reports/tr38/#kHanyuPinlu) 部分的拼音数据(来源于《現代漢語頻率詞典》的拼音数据)
* `kMandarin.txt`: [Unihan Database][unihan][kMandarin](http://www.unicode.org/reports/tr38/#kMandarin) 部分的拼音数据(普通话中最常用的一个读音。zh-CN 为主,如果 zh-CN 中没有则使用 zh-TW 中的拼音)
* `GBK_PUA.txt`: [Private Use Area](https://en.wikipedia.org/wiki/Private_Use_Areas) 中有拼音的汉字,参考 [GB 18030 - 维基百科,自由的百科全书](https://zh.wikipedia.org/wiki/GB_18030#PUA)
* `nonCJKUI.txt`: 不属于 [CJK Unified Ideograph](https://en.wikipedia.org/wiki/CJK_Unified_Ideographs) 但是却有拼音的字符
* `overwrite.txt`: 手工纠正的拼音数据**上面的拼音数据都是通过程序生成的,修改的话只修改这个就可以了**
* `kMandarin_overwrite.txt`: 手工纠正 `kMandarin.txt` 中有误的拼音数据(**可以修改**
* `GBK_PUA.txt`: [Private Use Area](https://en.wikipedia.org/wiki/Private_Use_Areas) 中有拼音的汉字,参考 [GB 18030 - 维基百科,自由的百科全书](https://zh.wikipedia.org/wiki/GB_18030#PUA)**可以修改**
* `nonCJKUI.txt`: 不属于 [CJK Unified Ideograph](https://en.wikipedia.org/wiki/CJK_Unified_Ideographs) 但是却有拼音的字符**可以修改**
* `kMandarin_8105.txt`: [《通用规范汉字表》](https://zh.wikipedia.org/wiki/通用规范汉字表)(2013 年版)里 8105 个汉字最常用的一个读音 (**可以修改**)
* `overwrite.txt`: 手工纠正的拼音数据(**可以修改**
* `pinyin.txt`: 合并上述文件后的拼音数据
* `zdic.txt`: [汉典网](http://zdic.net) 的拼音数据

Expand Down
64 changes: 64 additions & 0 deletions kMandarin_overwrite.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
U+389C: kāng # 㢜
U+60B7: lì # 悷
U+417F: huá # 䅿
U+46BE: rén # 䚾
U+4B78: fù # 䭸
U+4B7B: fēn # 䭻
U+4CC9: dōng # 䳉
U+4D7B: huì # 䵻
U+57D4: pǔ # 埔
U+5A47: cǎi # 婇
U+5F6F: piāo # 彯
U+5F77: páng # 彷
U+60B7: lì # 悷
U+65FD: tūn # 旽
U+6A0B: tōng # 樋
U+6ADA: lǘ # 櫚
U+6E5E: zhēn # 湞
U+73D6: guāng # 珖
U+77A1: guī # 瞡
U+7BC9: zhù # 築
U+815C: méi # 腜
U+816C: róu # 腬
U+8192: ōu # 膒
U+8491: yīn # 蒑
U+8A09: fàn # 訉
U+90D8: lǚ # 郘
U+9D24: zhōng # 鴤
U+2031A: nòng # 𠌚
U+2141D: fú # 𡐝
U+21594: nuó # 𡖔
U+2199D: xiāo # 𡦝
U+21B0D: mí # 𡬍
U+21B10: yí # 𡬐
U+21B15: lóng # 𡬕
U+2243F: rǎng # 𢐿
U+2273D: kuí # 𢜽
U+22741: hōng # 𢝁
U+22892: sū # 𢢒
U+22A10: jí # 𢨐
U+245ED: xià # 𤗭
U+24704: huái # 𤜄
U+247AE: zhài # 𤞮
U+24856: yán # 𤡖
U+248B5: lài # 𤢵
U+249EB: jīn # 𤧫
U+2546B: kān # 𥑫
U+2588D: hù # 𥢍
U+2588F: diàn # 𥢏
U+25C1F: yuán # 𥰟
U+272D5: kùn # 𧋕
U+2757A: shuāng # 𧕺
U+275C8: nú # 𧗈
U+27956: lí # 𧥖
U+280A2: jí # 𨂢
U+2824B: tuō # 𨉋
U+284A8: hài # 𨒨
U+28ABF: liú # 𨪿
U+28DED: chán # 𨷭
U+28E30: jú # 𨸰
U+293CF: wéi # 𩏏
U+295F5: zhēng # 𩗵
U+29B5D: wǒ # 𩭝
U+2A048: zhuāng # 𪁈
U+2A2A2: shí # 𪊢
4 changes: 4 additions & 0 deletions merge_unihan.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,10 @@ def extend_pinyins(old_map, new_map, only_no_exists=False):
with open('kMandarin_8105.txt') as fp:
adjust_pinyin_map = parse_pinyins(fp)
extend_pinyins(raw_pinyin_map, adjust_pinyin_map)
with open('kMandarin_overwrite.txt') as fp:
_map = parse_pinyins(fp)
extend_pinyins(adjust_pinyin_map, _map)
extend_pinyins(raw_pinyin_map, adjust_pinyin_map)
with open('kMandarin.txt') as fp:
_map = parse_pinyins(fp)
extend_pinyins(adjust_pinyin_map, _map)
Expand Down

0 comments on commit 99dcb07

Please sign in to comment.