-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node_set.rb:24: [BUG] Segmentation fault #881
Comments
This is a most likely GC bug. Disabling GC makes it work. I've tried different versions of ruby, libxml and nokogiri. No luck there. |
I'm expiriencing the same problem. The issue seems not to be related to Sidekiq and Celluloid: I tried migrating my whole app to Resque, and the problem remained. Tried different versions of ruby, libxml, nokogiri. Nothing helps. |
If it's random, it might be related to this: http://bugs.ruby-lang.org/issues/8100 Try Ruby 2.0 patched with changeset 39919 (Also reported to affect 1.9.3 I believe) |
Latest ruby versions do not solve this issue, I'm still having segfaults. |
I just got this instead of a segfault. |
@tenderlove @flavorjones @sparklemotion I finally figured out what the problem was. I have a reproduceable test for segfault: https://github.com/sthetz/nokogiri-segfault |
Thank you for isolating the issue! I was able to pare it down to this snippet: require 'nokogiri'
require 'libxml'
loop do
threads = []
20.times do
threads << Thread.new do
d = Nokogiri::XML '<foo><bar></bar></foo>'
(d/'bar').each{}
end
end
threads.each { |thread| thread.join }
end The issue is that libxml-ruby's initializer hooks into libxml2 in a way that is incompatible with Nokogiri. In libxml-ruby's ext/libxml/ruby_xml_node.c: The problem is that this hook is global, and even libxml2 nodes that are managed by Nokogiri end up passing through the rxml_node_deregisterNode() function. This function is only meant to handle libxml-ruby nodes, and it results in memory corruption. I'm not sure yet what the solution is, and I really have to move the pigs onto new pasture right now. I'll check back on this tonight. Thanks again for taking the time to isolate the issue. |
CC @cfis |
Interesting. libxml-ruby depends on that callback for memory management, so it can't go away. Perhaps though the callback could do a type check on the ruby object in _private and if its not a libxmlruby object just ignore it? |
Nokogiri stores a VALUE in the _private field for nodes just like libxml-ruby does. My comment about memory corruption was because I mistakenly thought that Nokogiri was putting a custom struct in there. So I don't see anything obviously wrong, even when rxml_node_deregisterNode() gets called on a node wrapped by Nokogiri. A type check is likely to make this issue go away, but I have a feeling there is something else going on. I'm having trouble narrowing it down though -- when I make changes to the the test snippet that should be unrelated I can no longer reproduce it. Seems to be timing or GC related (but GC.stress doesn't produce it either). Still fiddling with this. |
@ender672 We were able to get rid of the segfault by disabling GC completely, so yes, it is GC related. |
@sthetz Thanks for pointing that out. Usually GC.stress is good at triggering GC issues, but it doesn't help here. I usually narrow the crashing/leaking snippet down until it only calls one Nokogiri method and debug it from there, but this one goes away when I try that. |
Here is what I have so far:
The method that I've seen this happen in so far is new() in xml_xpath.c: nokogiri/ext/nokogiri/xml_xpath_context.c Lines 275 to 296 in f897a2e
The parameter nodeobj is the already-freed node. I gotta go pick up restaurant food scraps. Will be back later. |
I'm narrowing down on the issue. Quick update: A very similar issue came up four years ago. The solution at the time was to avoid the libxml-ruby callback by temporarily disabling the node-deregister-callback. However, that solution was never really complete -- there are many other places in Nokogiri where libxml2 nodes are deregistered. In 2011, libxml-ruby enabled the callback for native OS threads (for ruby 1.9.x compatibility). The Nokogiri workaround doesn't work in this case, and @sthetz's snippet uses multithreading to trigger the four year old bug again. I have a hunch that the libxml-ruby callback is just triggering a deeper Nokogiri issue. Still digging. |
I thought the libxml-ruby callback would be benign, but turns out that VALUE pointers are unreliable when used in the free() function of a ruby-wrapped C struct. By the time the free() function is invoked the VALUE pointer may have been recycled. Here is what triggers the error:
In order for the libxml-ruby callback to be safe, Nokogiri will have to make sure that every libxml2 node has its _private field unset before we call xmlFree(). |
Does this happen with libxml < 2.9.0? If not, then I think we should close this, as Nokogiri doesn't support 2.9.0 yet (see #829 for one example reason why). I'm also not really comfortable hacking Nokogiri to work around libxml-ruby. It's an old and apparently-unsupported gem. |
It does happen in 2.8 as well
|
@flavorjones - It's a problem with GC timing. I won't be adding any hacks to work around this. All attention has been on identifying & understanding the problem. libxml-ruby is still active and it's included in enough code that I absolutely want to fix this issue. This kind of thing spawns lots of segfault bug reports. |
libxml-ruby - not old and not unsupported (i'm the maintainer). |
@fenelon - are you loading libxml-ruby? A good place to check is your Gemfile.lock. |
So @flavorjones was right and the only way I can think of fixing this was to add yet another workaround in Nokogiri. How bout a haiku?
|
There is an alternative -- we could raise an exception if libxml-ruby is detected, spit out a warning, allow an environment variable to override, etc. I lean more towards the commit in the pull request. Nice thing about it is that the cleanup only happens when the libxml-ruby callback is detected. |
Any chance of this issue getting fixed any time soon? |
issue still present w/ ruby 2.1 & 2.0 |
sparklemotion/nokogiri#881 doesn't look promising
#895 was merged. Should fix this issue. |
Will be in 1.6.3.rc1 to be released later today. |
still an issue?
Do we need to set up Nokogiri to use different versions of libxml2 by changing the paths when we install?
|
We are still seeing what looks very much like this same issue with nokogiri Removing libxml-ruby, even though the repro isn't using it, prevents the crash. We've decided to just remove libxml-ruby from our app, since we are only using it in a couple small areas and removing it will be very low effort, but I still wanted to report it. |
Doing so causes segfault. sparklemotion/nokogiri#881 (comment)
This fixes segfault with the following snippet: * https://gist.github.com/codekitchen/2715ddc89e782b3e6c6f * sparklemotion/nokogiri#881 (comment)
Hi, I'm having this kind of crash, too. It really seems to be an interaction between your 2 libs. Here is the trace /home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/libxml-ruby-2.7.0/lib/libxml/node.rb:75:in `find'
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/libxml-ruby-2.7.0/lib/libxml/node.rb:58:in `context'
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/libxml-ruby-2.7.0/lib/libxml/node.rb:58:in `new'
-- Machine register context ------------------------------------------------
RIP: 0x00000000105003fd RBP: 0x00000000193d39d0 RSP: 0x00000000193d39b0
RAX: 0x636e6174736e692d RBX: 0x00000000063bdbb0 RCX: 0x00000000055282e0
RDX: 0x000000001c565a30 RDI: 0x00000000144438b0 RSI: 0x0000000000000001
R8: 0x0000000000000000 R9: 0x0000000010646c22 R10: 0x0000000000000000
R11: 0x0000000011ba4fa0 R12: 0x00000000063bc000 R13: 0x00000000063bcf30
R14: 0x000000000639bce0 R15: 0x00000000193d3be8 EFL: 0x0000000000000004
-- C level backtrace information -------------------------------------------
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_vm_bugreport+0x4ea) [0x501d30a] vm_dump.c:693
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_bug_context+0xcb) [0x4eb30ab] error.c:425
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(sigsegv+0x3e) [0x4f9145e] signal.c:879
/lib/x86_64-linux-gnu/libpthread.so.0 [0x531f8d0]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x87) [0x105003fd]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeProp+0xb8) [0x104fe019]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreePropList+0x2f) [0x104fdf50]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x148) [0x105004be]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeDoc+0x161) [0x104fc75f]
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(finalize_list+0x51) [0x4ed1021] gc.c:2463
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(gc_finalize_deferred+0x50) [0x4ed2550] gc.c:2500
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_postponed_job_flush+0x133) [0x5024563] vm_trace.c:1572
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_threadptr_execute_interrupts.part.41+0x139) [0x502a7f9] thread.c:1971
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call0_body.constprop.78+0x52e) [0x501046e] vm_eval.c:252
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_call0+0x192) [0x5011562] vm_eval.c:59
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_class_new_instance+0x21) [0x4f1f281] object.c:1856
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x124d) [0x5009c4d] insns.def:1054
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call0_body.constprop.78+0x1ce) [0x501010e] vm_eval.c:180
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_call0+0x192) [0x5011562] vm_eval.c:59
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_class_new_instance+0x21) [0x4f1f281] object.c:1856
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x124d) [0x5009c4d] insns.def:1054
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_yield+0x492) [0x50190f2] vm.c:813
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_ary_each+0x52) [0x4e63e22] array.c:1803
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x1197) [0x5009b97] insns.def:1024
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call0_body.constprop.78+0x1ce) [0x501010e] vm_eval.c:180
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_call0+0x192) [0x5011562] vm_eval.c:59
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_iterate+0xea) [0x5005c0a] vm_eval.c:1129
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_block_call+0x2b) [0x5005dcb] vm_eval.c:1198
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(enum_to_a+0x38) [0x4ea7968] enum.c:503
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x124d) [0x5009c4d] insns.def:1054
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(invoke_block_from_c+0x6be) [0x501462e] vm.c:813
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_invoke_proc+0xe0) [0x50147f0] vm.c:878
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_vm_invoke_proc+0x18) [0x50148d8] vm.c:897
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(proc_call+0x52) [0x4ec2452] proc.c:731
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_method+0x11e) [0x501652e] vm_insnhelper.c:1691
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x124d) [0x5009c4d] insns.def:1054
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call0_body.constprop.78+0x1ce) [0x501010e] vm_eval.c:180
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_call0+0x192) [0x5011562] vm_eval.c:59
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(send_internal+0xd2) [0x5016fc2] vm_eval.c:928
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_method+0x11e) [0x501652e] vm_insnhelper.c:1691
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x1197) [0x5009b97] insns.def:1024
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(invoke_block_from_c+0x6be) [0x501462e] vm.c:813
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_invoke_proc+0xe0) [0x50147f0] vm.c:878
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_vm_invoke_proc+0x18) [0x50148d8] vm.c:897
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_fiber_start+0x110) [0x5031c70] cont.c:1263
|
@Coren - Please open a new issue. This issue has been closed for almost a year and a half. Your problem is unlikely to be the same root cause, so let's track it as a new problem. Thanks! |
sure, see #1364 |
This mostly happens in concurrent environments. I'm using
nokogiri
withsidekiq
, androadie
(which also uses nokogiri).nokogiri -v
Backtrace:
The text was updated successfully, but these errors were encountered: