-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Channel refactor and optimize #8322
Channel refactor and optimize #8322
Conversation
Hi firejox, how does it affect performance, do you have any benchmarks you can show? |
I don't know if anyone told you, but the PRs you are sending related to concurrency are great! |
@lribeiro Here is the benchmarks channel close benchmarkbefore
after
channel select unwait benchmarkbefore
after
channel ping pong benchmarkbefore
after
|
d5a39ab
to
5747de8
Compare
Thank you @firejox for all the work on concurrency!! |
src/crystal/static_list.cr
Outdated
@@ -0,0 +1,98 @@ | |||
# :nodoc: | |||
macro container_of(ptr, type, member) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this is :nodoc:
it leaks to the top-level namespace. So in my code I can call container_of
and it will call this macro. This should probably be moved inside Crystal::StaticList
and called like Crystal::StaticList.container_of
. Same for the other macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will move these two macros inside the struct. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asterite Sorry for disturbing. I have a problem when I put them into Crystal::StaticList
. The macros seem to be unable to know the Sender
and Receiver
because of visibility. And now I find three ways to resolve it.
-
Remove the
private
ofSender
andReceiver
so that it can callCrystal::StaticList.container_of
. But the problem is it will leakSender
andReceiver
to public. -
Pack macros into a module like
Crystal::StaticList::Reflect
, and include the module into channel. It will call thecontainer_of
instead ofCrystal::StaticList.container_of
. And this way will be ugly and hard to understand. -
Use annotation and macro to generate corresponding method. Then it will invoke
something likeReceiver(T).from_{{ member_name }}(ptr)
. This could increase the compile time because of extra computation.
Here is the code pieces about these changes
https://gist.github.com/firejox/a6f2193a5e0820eef7e7e36c937c2330
Maybe there are other clever ways that I didn't come out. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of private visibility you can just put :nodoc: in a doc comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will use the :nodoc: . Thank you.
c6ae669
to
b5fffa6
Compare
@@ -352,19 +405,18 @@ class Channel(T) | |||
|
|||
ops_locks.each &.lock | |||
|
|||
# Check that no channel is closed | |||
unless ops.all?(&.available?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is removed, then the available?
is no longer used.
But I don't follow where the check that non of the channel is originally closed is done. Because if there is a channel ready and other closed I think the current code in this PR will yield and not raise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because op.execute
will return proper state, I think there is no need to check channel twice. The case you mentioned is a new case, and current spec doesn't cover it. The following code:
# ch1 is ready and ch2 is closed
# a.cr
select
when m1 = ch1.receive?
when m2 = ch2.receive
end
# b.cr
select
when m1 = ch1.receive
when m2 = ch2.receive?
end
What are the expect behaviors of the two code pieces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. We can avoid the double check and do not have a guarantee regarding which channel, the closed or the ready one, will fulfill the select.
a.cr
will either execute ch1'swhen
withm1 = nil
, or ch2'swhen
with data.b.cr
will either raise due to ch1 been closed, or ch2'swhen
with data.
There are no specs mixing receive
and receive?
though.
If ch2
would not be ready I would expect that b.cr
will always raise. Both on a blocking and non-blocking select
. That is a spec that could be added. But I don't expect that receive
and receive?
are mixed.
So, this change is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Although your precondition is different from me, I will add the spec about mixing receive
and receive?
.
src/crystal/static_list.cr
Outdated
@@ -0,0 +1,97 @@ | |||
# :nodoc: | |||
struct Crystal::StaticList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to find a more crystal way to build this static list.
I think it should be possible to use a generic struct that will enforce typing and maybe allow some macros to hide the boilerplate as done here. But all the pointer, casting and final interface make me doubt.
Also, although it :nodoc:
I think there could be some specs on the implementation.
The main problem is to build a linked list with data that should act as nodes and let them be stored in the stack, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a difference between internal storage and external storage. The external storage will enforce type explicit, and the internal storage requires some way to know which type stores the list node. That's why need the Crystal::StaticList.container_of
. Because we store data in stack, it cannot prevent to handle without pointer.
And I will add some spec about the StaticList
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about something like this https://gist.github.com/bcardiff/4327823bb43cbb93734fbb6240315dac ?
- there are less casts
- the pointer tricker is hidden in the module
- the API is a bit more clean: sometimes as methods, sometimes as macro calls.
there might be some subtle differences regarding the representation of the list. and this solution can only be used in structs
. I didn't care to apply them to classes since I think it is only used in structs.
In case we need to extend it, in the worst case macros can be injected in the type that is including the module to specialize if the next/prev need to be pointer(T) or T.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late on reply because I'm busy today. In my opinion, this change is a bit worse than mine. Listify.prepend(list, node)
will make code like c, not crystal. Although I currently use node.append_to pointerof(list)
in code, the implementation can be easily replaced by list.append node.self_pointer
. list.append
and list.shift
will be better than Listify.append
and Listify.shift
.
Besides, the each
operation should not hide the pointer. If you hide the pointer, the object you get is the copy object of the pointer point to. Any changes will not apply on the original object. This will be a problem.
Including a module is good, but I don't want to limit the name of member variable in struct/class which use the list. (If possible, they may need the list with different name for future refactor/optimize)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I acknowledge that the loop in my gist should return pointers.
But I would argue that for me, the API you are proposing
- seems more similar to what I would find/write in C.
- have more features that needed considering it will be hidden from the std-lib
a. it is only used for structs but classes are supported
b. it allows elements to exists in multiple lists at the same time (by passing the member). But at the cost of adding repetition in callers.
I still think the API like the one proposed in my gist is more simple and can do the job.
I would let others from the core team weigh in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the way that Thread::LinkedList
does it, and i'm pretty sure you can generalise that to work on the raw pointers. Then you don't need the init_empty_list
either, just store the head and tail and use null pointers to signify the start/end of the list.
b5fffa6
to
288a200
Compare
Updates:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/crystal/static_list.cr
Outdated
@@ -0,0 +1,97 @@ | |||
# :nodoc: | |||
struct Crystal::StaticList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I acknowledge that the loop in my gist should return pointers.
But I would argue that for me, the API you are proposing
- seems more similar to what I would find/write in C.
- have more features that needed considering it will be hidden from the std-lib
a. it is only used for structs but classes are supported
b. it allows elements to exists in multiple lists at the same time (by passing the member). But at the cost of adding repetition in callers.
I still think the API like the one proposed in my gist is more simple and can do the job.
I would let others from the core team weigh in here.
288a200
to
031c4ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to review this but I currently don't have much time, plus concurrency internals is not my strong point (at the moment).
@bcardiff We can wait for a review from at least one other core-team member, but if you think this is good (code is readable, does the same thing, doesn't have hacks that later on we won't understand unless we add more comments or docs, etc.) then I'd merge this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lists are going to need a better API. I can barely follow the specs, and it seems quite easy to slip up using them.
spec/std/crystal/static_list_spec.cr
Outdated
end | ||
end | ||
|
||
describe "Crysta::StaticList" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crystal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This typo can be avoided by removing the quotes.
src/channel.cr
Outdated
# Because channel#close may clean up a long list, `select_context.try_trigger` may | ||
# be called after the select return. In order to prevent invalid address access, | ||
# the state is allocated in the heap. | ||
state_ptr = Pointer(Atomic(SelectState)).malloc(1, Atomic(SelectState).new(SelectState::Active)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably just do Pointer.malloc
here - the compiler will guess the type from the Atomic(SelectState).new(SelectState::Active)
.
src/crystal/static_list.cr
Outdated
@@ -0,0 +1,97 @@ | |||
# :nodoc: | |||
struct Crystal::StaticList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the way that Thread::LinkedList
does it, and i'm pretty sure you can generalise that to work on the raw pointers. Then you don't need the init_empty_list
either, just store the head and tail and use null pointers to signify the start/end of the list.
432003f
to
8916ea6
Compare
I still don't understand the reason why you need |
Like, why not just make your list generic and assume that
|
8916ea6
to
b540ce5
Compare
@RX14 I want to keep some flexibility, so I use You cannot do that as Now I have made the code in generic way. It seems that my implementation is not widely accepted by core team. |
I personally like the performance improvements but I think the code becomes much harder to follow, with many low-level tricks. My fear is that in a couple of months we won't be able to understand how it all works. I wish for the performance improvements to happen without introducing that much of a code change. |
This flexibility - when the type will never be used outside of the stdlib - is not worth the drop in readability and the complication of the list API.
I think really the only complicated tricks are in the list implementation. If that gets fixed, then i'm perfectly happy to merge this, pending @bcardiff's input. |
@firejox I made another round on this PR. Applied a couple of refactors to use linked list specifically for this use and move the shared state to a class instead of a manual The underlying behavior is the same. Then I perform the benchmarks you sent at #8322 (comment), with
Which platform/hardware are you using? Between the refactor and the fix of the potential invalid memory access, I don't mind the slowness in Do you want to add those commits here? Or we can open another PR based on this one and pick it up from there. As you prefer. Again, thanks for all the effort. |
@bcardiff Interesting. I don't know why it does not yield the improvement. It actually shows the difference on my computer.
My platform is Linux 5.1.7-arch1-1, x86_64, AMD Athlon X2 II 270 3.4 GHz
The changes are great. It is more readable. I will merge them later. Thank you. EDIT: Sorry. I didn't notice that there are two commits. I only looked at the last one before. Another commit can be better. Because the implementation is change, there is no need to use I will pick the last one and make some change on the other one. |
Sure thing @firejox thanks for keep iterating. |
b540ce5
to
ba92974
Compare
ba92974
to
fdb3167
Compare
Several changes are applied: 1. use internal storage list to reduce heap allocation 2. move cleaning operations out of the lock section during close 3. remove redundant lock by making internal send/receive return state 4. because of 3, non blocking select can be determined by state 5. the wait/unwait of select action can be done in O(1)
fdb3167
to
ba33bee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 I have no more feedback. LGTM!
def compare_and_set(cmp : SelectState, new : SelectState) : {SelectState, Bool} | ||
@state.compare_and_set(SelectState::Active, SelectState::Done) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, seems like these arguments (cmp & new) aren't used anywhere... 🤔
Shouldn't this method body be @state.compare_and_set(cmp, new)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Fortunately, it is only used on those values 🙈.
Several changes are applied: