-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix binary nokogiri gems on musl based Linux #1990
Conversation
The binary nokogiri gem built with rake-compiler-dock-1.0.0 segfaults on x86_64-linux-musl. This happens in: lib/nokogiri/xml/reader.rb:92:in `namespaces' It turned out that any calls to sprintf() with %-arguments fail. I did not dig deeper on assembly level, since this is the only use of sprintf in nokogiri. Moreover it can be replaced by calls to ruby string functions easily. Related to sparklemotion#1983
Musl doesn't recognize cp932 encoding. It also fails when used on the command line with iconv. For this test case cp932 can be replaced by Shift_JIS, which is identical for the characters in question and that is handled by musl. Error was: 1) Error: Nokogiri::HTML::TestDocumentEncoding#test_encoding_without_charset: ArgumentError: invalid byte sequence in UTF-8 /home/lars/comcard/nokogiri/test/html/test_document_encoding.rb:26:in `test_encoding_without_charset' Related to sparklemotion#1983
The test failure is also present, when nokogiri is native compiled on Alpine Linux. It is not specific to the binary gems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a drive-by review... I don't have a stake in this.
} | ||
|
||
key = rb_str_conv_enc(key, rb_utf8_encoding(), rb_default_internal_encoding()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how much your code deletes! But this seems like a wide departure from the original, with extra string concatenation for something with known size. EIther the string should be allocated ahead of time (which doesn't look that easy to do with cruby's API), or the concatenation should be avoided (easy). Here's the (rough + untested) cruby equivalent to the original code + your encoding differences:
if (ns->prefix) {
key = rb_enc_sprintf(rb_utf8_encoding(), "%s:%s", XMLNS_PREFIX, ns->prefix);
} else {
key = rb_enc_sprintf(rb_utf8_encoding(), "%s", XMLNS_PREFIX);
}
key = rb_str_conv_enc(key, rb_utf8_encoding(), rb_default_internal_encoding());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @zenspider for reviewing! rb_enc_str_new_cstr()
allocates a struct RString with an embedded string length of 23 bytes at maximum. So each string has a capacity of 23 ASCII characters from the start. Since namespace keys tend to be shorter, usually no reallocation is required. Given that sprintf
is usually slower due to format string interpretation, I think the proposed implementation is best how it is.
THIS looks awesome. I would love to have nokogiri binary gems on alpine. |
Just a note - I want to add alpine to the CI pipeline (per note in #1845) before merging this; I'd like to go red (see the segfault reliably on alpine) first, then merge and go green. |
I've added two new things to the CI pipelines (master and PRs):
These should go red, and I'll manually merge these two commits to make each red build go green. |
First red build: https://ci.nokogiri.org/teams/nokogiri-core/pipelines/nokogiri/jobs/ruby-musl-system/builds/1
|
First commit merged: ff99fa5 And the red build is now green: https://ci.nokogiri.org/teams/nokogiri-core/pipelines/nokogiri/jobs/ruby-musl-system/builds/3 |
Second red build: https://ci.nokogiri.org/teams/nokogiri-core/pipelines/nokogiri/jobs/native-gem-test/builds/1
|
Second commit merged: a3d9438 And the red build is now green: https://ci.nokogiri.org/teams/nokogiri-core/pipelines/nokogiri/jobs/native-gem-test/builds/2 |
OK - additional test coverage is all green, after manual merges. Closing. |
What problem is this PR intended to solve?
As described in #1983 there are two issues when using the nokogiri binary gem on musl based Linux.
The first commit fixes a segfault, the second commit fixes a failing test case. The the commit description for more details.
Have you included adequate test coverage?
Not yet. Tests on Alpine were run per
docker run --rm -it ruby:alpine /bin/sh
.Does this change affect the C or the Java implementations?
The C version only.