Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utf8 support #468

Merged
merged 23 commits into from
Jul 2, 2019
Merged

Utf8 support #468

merged 23 commits into from
Jul 2, 2019

Conversation

OpportunityLiu
Copy link
Member

@OpportunityLiu OpportunityLiu commented Jun 24, 2019

#467

  • Rewrite whole io API with stdio/winapi (for std files) and tbox (for disk files)
    • Do not support file:read("n") for disk files
    • For windows, io.read(num) will not ensure returns num of chars since it reads num of utf-16 chars and convert it to utf-8
    • Add opt param for most api
      opt = { continuation = "\\", encoding = "utf16le" }
      • encoding will be detected if not specified, use "binary" to disable this feature and read as a binary file
      • continuation is not supported for binary files, for other files, char sequence continuation .. "\r\n" or continuation .. "\n" will be replaced to an empty char sequence in read apis, this option does not affect writing
    • Add file:path() and file:size() API
  • Add a .clang-format file for ./core
  • Add some codepage related api for winos
    • winos.ansi_cp() get ansi codepage
    • winos.oem_cp() get oem codepage
    • winos.cp_info(cp) get info of a codepage
    • winos.console_cp(cp) set console cp, if cp not provided, get current console cp
    • winos.console_output_cp(cp) set console output cp, if cp not provided, get current console output cp
    • winos.mbstoutf8(cp, data) convert data of cp to utf8 string
  • Add table.to_array method
    table.to_array(ipairs({"a", "b"})) -- {{1,"a",n=2},{2,"b",n=2}},2
    table.to_array(io.lines("file")) -- {"line 1","line 2", ... , "line n"},n
  • Add raw* method and global method to debug
  • Fix a bug that circular reference will cause stackoverflow on utils.dump

For xmake developers:

  • temp file named "null" and "nullin" will be used as os.nuldev() for windows
  • print is not available in ./xmake/core domain, you should use io.write here

@OpportunityLiu OpportunityLiu changed the base branch from master to dev June 24, 2019 06:17
core/src/demo/xmake.c Outdated Show resolved Hide resolved
core/src/xmake/io/write.c Outdated Show resolved Hide resolved
core/src/xmake/io/write.c Outdated Show resolved Hide resolved
core/src/xmake/io/write.c Outdated Show resolved Hide resolved
程序/源文件/你好.c Outdated Show resolved Hide resolved
@OpportunityLiu

This comment has been minimized.

@OpportunityLiu

This comment has been minimized.

@OpportunityLiu

This comment has been minimized.

@waruqi
Copy link
Member

waruqi commented Jun 24, 2019

需要在tboox里也搞一个类似 PDC_FORCE_UTF8 这种macro

用于强制 wcstombs/mbstowcs 切到 utf8上去?

@waruqi
Copy link
Member

waruqi commented Jun 24, 2019

感觉需要去改TBOOX,那里面全部使用的ansi,虽然调用win api的时候是转成wchar_t然后调用W版本,但接收的还是ansi编码的char*,我现在全改成utf8的char*以后,这些api调用全都要重新转一遍。。。

都是char* 处理,你传utf8就是处理utf8 传ansi就是处理ansi,实际最终转换成win api调用的时候,转下不就好了,如果是转的有问题。加上你刚说的 PDC_FORCE_UTF8 去修改 wcstombs/mbstowcs 的行为就行了吧。。

其他地方还要转换什么?

@OpportunityLiu
Copy link
Member Author

剩下都差不多了,虽然命令行那里比较丑陋,不过能用

@waruqi waruqi changed the base branch from dev to utf8 June 24, 2019 12:33
core/src/xmake/machine.c Outdated Show resolved Hide resolved
core/src/xmake/machine.c Outdated Show resolved Hide resolved
core/src/xmake/machine.c Outdated Show resolved Hide resolved
core/src/xmake/machine.c Outdated Show resolved Hide resolved
@OpportunityLiu OpportunityLiu force-pushed the utf8 branch 3 times, most recently from 11ee818 to eaf0d01 Compare June 27, 2019 03:00
@OpportunityLiu OpportunityLiu changed the base branch from utf8 to dev June 27, 2019 03:02
@OpportunityLiu
Copy link
Member Author

rebase 了,先切到dev了

@waruqi waruqi changed the base branch from dev to utf8 June 27, 2019 03:07
CHANGELOG.md Outdated Show resolved Hide resolved
@OpportunityLiu
Copy link
Member Author

差不多了就切到dev分支吧,我已经把那边的更改merge进来了

@waruqi
Copy link
Member

waruqi commented Jul 2, 2019

差不多了就切到dev分支吧,我已经把那边的更改merge进来了

你暂时不用merge dev哈,我还要先merge到 utf8分支下,本地测试,然后做些修饰后,确认没啥问题了,才会去merge dev

我之后我去对dev merge的,这回改动太大。。我还是要在utf8 branch下。。多改改测测的

@@ -0,0 +1,80 @@
function test_read(t)
t:are_equal(io.readfile("files/utf8bom-lf-eleof"), "123\\\n456\n789\n")
t:are_equal(io.readfile("files/utf8-crlf-neleof"), "123\\\n456\n789")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

就是说,现在readfile默认的行为,会对 \r\n 自动处理成 \n 跟 unix下 保持一致?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

以前也会啊,C标准规定了<stdio.h> 里的所有字符串相关函数都需要将 '\n' 看作 Platform specific的换行符,这也是fopen(..,“r”)fopen(..,“rb”)的唯一区别。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在自己读字节流了这个屁股当然也要擦啊,不然全错

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fopen(..,“r”)fopen(..,“rb”)在'\n'做行尾的平台是完全一致的

@OpportunityLiu
Copy link
Member Author

只是把dev的一些更改合并进来了,到时候你的utf8分支merge进去方便一点

@waruqi
Copy link
Member

waruqi commented Jul 2, 2019

只是把dev的一些更改合并进来了,到时候你的utf8分支merge进去方便一点

嗯,先这样吧,目前初步看了下,没啥大问题,等晚上有时间我在拉下来 测测 其他一些代码风格的修饰,我之后会稍微处理下。。

@waruqi
Copy link
Member

waruqi commented Jul 2, 2019

ruki:xmake ruki$ xmake l
> {"1"}
< { &1    <-- 这个是啥?
    "1"
  }

image

@OpportunityLiu
Copy link
Member Author

OpportunityLiu commented Jul 2, 2019

ruki:xmake ruki$ xmake l
> {"1"}
< { &1    <-- 这个是啥?
    "1"
  }

image

你可以看下debug.global("io").file,我又不想跑两遍来决定谁加谁不加,所以就无脑加上了
参考的是Yaml的表示方式

@waruqi
Copy link
Member

waruqi commented Jul 2, 2019

debug.global("io").file

? dump._print_anchor当初加这个的目的是什么?仅仅是为了dump缩进位置?还是 debug 模块需要?

什么场景需要dump这个? 有相关reference么?

@OpportunityLiu
Copy link
Member Author

debug.global("io").file

这就是相关场景啊,不搞这个直接stackoverflow啊

@waruqi
Copy link
Member

waruqi commented Jul 2, 2019

debug.global("io").file

这就是相关场景啊,不搞这个直接stackoverflow啊

这个跟 _print_anchor &1 &2 有什么关系,在stackoverflow上没看到相关说明,有相关参考links么

@OpportunityLiu
Copy link
Member Author

OpportunityLiu commented Jul 2, 2019

..我说的是stackoverflow异常,循环引用的obj不可能print的出来,只能anchor-ref表示

图片

@waruqi
Copy link
Member

waruqi commented Jul 2, 2019

..我说的是stackoverflow异常,循环引用的obj不可能print的出来,只能anchor-ref表示

但是我看打印循环引用的obj,不是显示 *1 这种么,而 &1 的显示,我看仅仅只是判断了下 first_value,并没看到有哪里判断obj引用么。。

> a
< { &1   ---> 那此处是什么,我代码里面没看到有判断obj引用,仅仅在first value的时候,打印下缩进位置
    1,
    2,
    a = *1   -> 显示循环引用obj
  }

难道不应该是这么显示么:

> a
< { 
    1,
    2,
    a = *1   -> 显示循环引用obj
  }

image

是为了 标示 *1 所在 引用id的位置?

@OpportunityLiu
Copy link
Member Author

去看YAML的文档,用的那个语法

是为了 标示 *1 所在 引用id的位置?

不然谁知道*1是那个

@OpportunityLiu
Copy link
Member Author

OpportunityLiu commented Jul 2, 2019

图片图片

@waruqi
Copy link
Member

waruqi commented Jul 2, 2019

但是,如果当前没有obj引用,也都全部显示 &x ,比如纯scalar array,这么显示就很莫名了,也不简洁。。

可以当前obj确实被其他obj有引用,才显示对应的 &x,会更好些,不过这样应该需要预先遍历遍,预处理生成下引用关系table

@waruqi
Copy link
Member

waruqi commented Jul 2, 2019

不过这个倒还好,回头我可以改进下显示。。我刚win xp上测了下,dump似乎有点问题 , 显示各种scalar都是nil,显示obj ,直接卡住。
Snip20190702_1

@OpportunityLiu
Copy link
Member Author

这部分纯lua实现的啊,怎么还有平台问题

@waruqi waruqi merged commit 117c7cc into xmake-io:utf8 Jul 2, 2019
@waruqi
Copy link
Member

waruqi commented Jul 2, 2019

不清楚,我先merge到utf8了,等晚上 我再调下这块。。

@OpportunityLiu OpportunityLiu deleted the utf8 branch July 2, 2019 09:02
@OpportunityLiu
Copy link
Member Author

记得拉个dev<-utf8的PR,我来review

@waruqi waruqi mentioned this pull request Jul 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants