Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in fiber switching with select #3900

Closed
kostya opened this issue Jan 15, 2017 · 7 comments
Closed

Crash in fiber switching with select #3900

kostya opened this issue Jan 15, 2017 · 7 comments
Labels
community:to-research kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:concurrency

Comments

@kostya
Copy link
Contributor

kostya commented Jan 15, 2017

ch1 = Channel::Buffered(Int32).new(1)
ch2 = Channel::Buffered(Int32).new(1)
res = [] of Int32

spawn do
  loop do
    select
    when x = ch1.receive
      res << x
    when y = ch2.receive
      res << y
    end
  end
end

spawn do
  3.times do |i|
    select
    when ch1.send(i)
    when ch2.send(i)
    end
  end
end

Fiber.yield
p res
Invalid memory access (signal 11) at address 0x0
[4466037] *CallStack::print_backtrace:Int32 +117
[4444920] __crystal_sigfault_handler +56
[4758788] sigfault_handler +40
[140503573855408] ???
[0] ???
@spalladino spalladino added community:help-wanted kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:concurrency labels Jan 16, 2017
@bmmcginty
Copy link
Contributor

Did some playing around with this today. Doesn't seem to happen if case statement is changed to if, and doesn't seem to happen if case statement is changed to if/elsif (with or without plain else statement).
Fails right after a stack switch, according to gdb. I logged every crystal function, and couldn't get any closer to the exact error location.
I'm trying to get a copy of the generated code to see what the end product of the case statements look like (it appears they're turned into if/elsif), but beyond that, I'm out of ideas.
I'm not sure how helpful any this is, but the issue was marked help-wanted, so figured I'd comment.

@kostya
Copy link
Contributor Author

kostya commented Jan 17, 2017

i tried to debug this, but its quite hard, something breaks in coroutines switches.

@firejox
Copy link
Contributor

firejox commented Jan 20, 2017

Because there is a same fiber in ch1 and ch2, that could push the same fiber into scheduler twice. If you add such logger, then you will see that.

ch1 = Channel(Int32).new(1)
ch2 = Channel(Int32).new(1)
res = [] of Int32

class Scheduler
  def self.log
    LibC.printf "#{@@runnables}\n"
  end
end

a = spawn do
  loop do
    select
    when x = ch1.receive
      res << x
    when y = ch2.receive
      res << y
    end
    LibC.printf "receive: "
    Scheduler.log
  end
end

b = spawn do
  3.times do |i|
    select
    when ch1.send(i)
    when ch2.send(i)
    end
    LibC.printf "send: "
    Scheduler.log
  end
end

p "receive fiber : #{a}"
p "send fiber : #{b}"

Fiber.yield
p res
"receive fiber : #<Fiber:0x1d37e00>"
"send fiber : #<Fiber:0x1d37d90>"
send: Deque{#<Fiber:0x1d37e00>}
send: Deque{#<Fiber:0x1d37e00>, #<Fiber:0x1d37e00>}
receive: Deque{#<Fiber:0x1d37e00>, #<Fiber:0x1d37d90>}
receive: Deque{#<Fiber:0x1d37e00>, #<Fiber:0x1d37d90>, #<Fiber:0x1d37d90>}
send: Deque{#<Fiber:0x1d37d90>, #<Fiber:0x1d37e00>}
Invalid memory access (signal 11) at address 0x0
[4413221] *CallStack::print_backtrace:Int32 +117
[4391592] __crystal_sigfault_handler +56
[4608565] sigfault_handler +40
[140189768220800] ???
[0] ???

After third sending, the fiber are going to the end and Scheduler will resume the fiber again. That's why it crashed.

firejox added a commit to firejox/crystal that referenced this issue Jan 22, 2017
- Fixes crystal-lang#3900 crystal-lang#3862
- Fully thread safe except coroutine switch
firejox added a commit to firejox/crystal that referenced this issue Jan 26, 2017
- Fixes crystal-lang#3900 crystal-lang#3862
- Fully thread safe except coroutine switch
firejox added a commit to firejox/crystal that referenced this issue Jan 26, 2017
- Fixes crystal-lang#3900 crystal-lang#3862
- Fully thread safe except coroutine switch
firejox added a commit to firejox/crystal that referenced this issue Jan 26, 2017
- Fixes crystal-lang#3900 crystal-lang#3862
- Fully thread safe except coroutine switch
@miketheman
Copy link
Contributor

#triage

This issue appears to persist with 0.23.1 https://carc.in/#/r/2imm

An effort to rewrite the Channel implementation has been undertaken in #3912, and also appears to have not much feedback.

@jhass
Copy link
Member

jhass commented Aug 29, 2019

It doesn't segfault anymore, but still crashes at runtime: https://carc.in/#/r/7gtb

Somebody wants to try it against #8112? :)

@jhass jhass changed the title sigfault with select and Channel::Buffered Crash in fiber switching with select Aug 29, 2019
@asterite
Copy link
Member

I tried it with #8112 and it works fine 🎉

@asterite
Copy link
Member

asterite commented Sep 2, 2019

Closed by #8112

Please reopen if the issue remains.

@asterite asterite closed this as completed Sep 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community:to-research kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:concurrency
Projects
None yet
7 participants