Channel refactor and optimize #8322

firejox · 2019-10-13T18:25:31Z

Several changes are applied:

use internal storage list to reduce heap allocation
move cleaning operations out of the lock section during close
remove redundant lock by making internal send/receive return state
because of 3, non blocking select can be determined by state
the wait/unwait of select action can be done in O(1)

lribeiro · 2019-10-13T20:31:30Z

Hi firejox, how does it affect performance, do you have any benchmarks you can show?

asterite · 2019-10-13T20:36:09Z

I don't know if anyone told you, but the PRs you are sending related to concurrency are great!

src/channel.cr

firejox · 2019-10-14T07:32:45Z

@lribeiro Here is the benchmarks

channel close benchmark

before

[developer@crystal-dev tmp]$ ../bin/crystal channel-close-bench.cr -Dpreview_mt --release -- 10000
Using compiled compiler at `.build/crystal'
00:54:15.125022741

after

[developer@crystal-dev tmp]$ ../bin/crystal channel-close-bench.cr -Dpreview_mt --release -- 10000
Using compiled compiler at `.build/crystal'
00:40:58.541846658

channel select unwait benchmark

before

[developer@crystal-dev tmp]$ ../bin/crystal channel-select-unwait-bench.cr --release -- 100000
Using compiled compiler at `.build/crystal'
00:00:00.450562818

after

[developer@crystal-dev tmp]$ ../bin/crystal channel-select-unwait-bench.cr --release -- 100000
Using compiled compiler at `.build/crystal'
00:00:00.210562496

channel ping pong benchmark

before

[developer@crystal-dev tmp]$ ../bin/crystal channel-ping-pong-bench.cr -Dpreview_mt --release -- 500000
Using compiled compiler at `.build/crystal'
00:00:20.045008184

after

[developer@crystal-dev tmp]$ ../bin/crystal channel-ping-pong-bench.cr -Dpreview_mt --release -- 500000
Using compiled compiler at `.build/crystal'
00:00:16.457026952

jkthorne · 2019-10-14T15:38:32Z

Thank you @firejox for all the work on concurrency!!

asterite · 2019-10-14T16:41:07Z

src/crystal/static_list.cr

@@ -0,0 +1,98 @@
+# :nodoc:
+macro container_of(ptr, type, member)


Even though this is :nodoc: it leaks to the top-level namespace. So in my code I can call container_of and it will call this macro. This should probably be moved inside Crystal::StaticList and called like Crystal::StaticList.container_of. Same for the other macro.

OK, I will move these two macros inside the struct. Thank you.

@asterite Sorry for disturbing. I have a problem when I put them into Crystal::StaticList. The macros seem to be unable to know the Sender and Receiver because of visibility. And now I find three ways to resolve it.

Remove the private of Sender and Receiver so that it can call Crystal::StaticList.container_of. But the problem is it will leak Sender and Receiver to public.

Pack macros into a module like Crystal::StaticList::Reflect, and include the module into channel. It will call the container_of instead of Crystal::StaticList.container_of. And this way will be ugly and hard to understand.

Use annotation and macro to generate corresponding method. Then it will invoke
something like Receiver(T).from_{{ member_name }}(ptr). This could increase the compile time because of extra computation.

Here is the code pieces about these changes
https://gist.github.com/firejox/a6f2193a5e0820eef7e7e36c937c2330

Maybe there are other clever ways that I didn't come out. What do you think?

Instead of private visibility you can just put :nodoc: in a doc comment.

OK, I will use the :nodoc: . Thank you.

src/channel.cr

bcardiff · 2019-10-17T20:20:08Z

src/channel.cr

@@ -352,19 +405,18 @@ class Channel(T)

    ops_locks.each &.lock

-    # Check that no channel is closed
-    unless ops.all?(&.available?)


If this is removed, then the available? is no longer used.

But I don't follow where the check that non of the channel is originally closed is done. Because if there is a channel ready and other closed I think the current code in this PR will yield and not raise.

Because op.execute will return proper state, I think there is no need to check channel twice. The case you mentioned is a new case, and current spec doesn't cover it. The following code:

# ch1 is ready and ch2 is closed # a.cr select when m1 = ch1.receive? when m2 = ch2.receive end # b.cr select when m1 = ch1.receive when m2 = ch2.receive? end

What are the expect behaviors of the two code pieces?

Ok. We can avoid the double check and do not have a guarantee regarding which channel, the closed or the ready one, will fulfill the select.

a.cr will either execute ch1's when with m1 = nil, or ch2's when with data.

b.cr will either raise due to ch1 been closed, or ch2's when with data.

There are no specs mixing receive and receive? though.

If ch2 would not be ready I would expect that b.cr will always raise. Both on a blocking and non-blocking select. That is a spec that could be added. But I don't expect that receive and receive? are mixed.

So, this change is good.

I see. Although your precondition is different from me, I will add the spec about mixing receive and receive?.

src/channel.cr

bcardiff · 2019-10-17T20:28:16Z

src/crystal/static_list.cr

@@ -0,0 +1,97 @@
+# :nodoc:
+struct Crystal::StaticList


I would like to find a more crystal way to build this static list.
I think it should be possible to use a generic struct that will enforce typing and maybe allow some macros to hide the boilerplate as done here. But all the pointer, casting and final interface make me doubt.

Also, although it :nodoc: I think there could be some specs on the implementation.

The main problem is to build a linked list with data that should act as nodes and let them be stored in the stack, right?

I think this is a difference between internal storage and external storage. The external storage will enforce type explicit, and the internal storage requires some way to know which type stores the list node. That's why need the Crystal::StaticList.container_of. Because we store data in stack, it cannot prevent to handle without pointer.

And I will add some spec about the StaticList.

What do you think about something like this https://gist.github.com/bcardiff/4327823bb43cbb93734fbb6240315dac ?

there are less casts

the pointer tricker is hidden in the module

the API is a bit more clean: sometimes as methods, sometimes as macro calls.

there might be some subtle differences regarding the representation of the list. and this solution can only be used in structs. I didn't care to apply them to classes since I think it is only used in structs.

In case we need to extend it, in the worst case macros can be injected in the type that is including the module to specialize if the next/prev need to be pointer(T) or T.

Sorry for late on reply because I'm busy today. In my opinion, this change is a bit worse than mine. Listify.prepend(list, node) will make code like c, not crystal. Although I currently use node.append_to pointerof(list) in code, the implementation can be easily replaced by list.append node.self_pointer. list.append and list.shift will be better than Listify.append and Listify.shift.

Besides, the each operation should not hide the pointer. If you hide the pointer, the object you get is the copy object of the pointer point to. Any changes will not apply on the original object. This will be a problem.

Including a module is good, but I don't want to limit the name of member variable in struct/class which use the list. (If possible, they may need the list with different name for future refactor/optimize)

I acknowledge that the loop in my gist should return pointers.

But I would argue that for me, the API you are proposing

seems more similar to what I would find/write in C.

have more features that needed considering it will be hidden from the std-lib
a. it is only used for structs but classes are supported
b. it allows elements to exists in multiple lists at the same time (by passing the member). But at the cost of adding repetition in callers.

I still think the API like the one proposed in my gist is more simple and can do the job.

I would let others from the core team weigh in here.

Look at the way that Thread::LinkedList does it, and i'm pretty sure you can generalise that to work on the raw pointers. Then you don't need the init_empty_list either, just store the head and tail and use null pointers to signify the start/end of the list.

firejox · 2019-10-20T10:48:47Z

Updates:

add specs of Crystal::StaticList
add specs of mixing receive and receive? select

bcardiff

I left a couple of comments more.

FYI I am waiting on this PR to ask for a rebase on #8006 .

Thanks for all the effort @firejox

src/channel.cr

bcardiff · 2019-10-21T15:59:56Z

src/crystal/static_list.cr

@@ -0,0 +1,97 @@
+# :nodoc:
+struct Crystal::StaticList


I acknowledge that the loop in my gist should return pointers.

But I would argue that for me, the API you are proposing

seems more similar to what I would find/write in C.

have more features that needed considering it will be hidden from the std-lib
a. it is only used for structs but classes are supported
b. it allows elements to exists in multiple lists at the same time (by passing the member). But at the cost of adding repetition in callers.

I still think the API like the one proposed in my gist is more simple and can do the job.

I would let others from the core team weigh in here.

asterite

I'd like to review this but I currently don't have much time, plus concurrency internals is not my strong point (at the moment).

@bcardiff We can wait for a review from at least one other core-team member, but if you think this is good (code is readable, does the same thing, doesn't have hacks that later on we won't understand unless we add more comments or docs, etc.) then I'd merge this.

RX14

The lists are going to need a better API. I can barely follow the specs, and it seems quite easy to slip up using them.

RX14 · 2019-10-31T12:26:56Z

spec/std/crystal/static_list_spec.cr

+  end
+end
+
+describe "Crysta::StaticList" do


This typo can be avoided by removing the quotes.

RX14 · 2019-10-31T12:39:19Z

src/channel.cr

+    # Because channel#close may clean up a long list, `select_context.try_trigger` may
+    # be called after the select return. In order to prevent invalid address access,
+    # the state is allocated in the heap.
+    state_ptr = Pointer(Atomic(SelectState)).malloc(1, Atomic(SelectState).new(SelectState::Active))


Can probably just do Pointer.malloc here - the compiler will guess the type from the Atomic(SelectState).new(SelectState::Active).

RX14 · 2019-10-31T12:43:59Z

src/crystal/static_list.cr

@@ -0,0 +1,97 @@
+# :nodoc:
+struct Crystal::StaticList


Look at the way that Thread::LinkedList does it, and i'm pretty sure you can generalise that to work on the raw pointers. Then you don't need the init_empty_list either, just store the head and tail and use null pointers to signify the start/end of the list.

RX14 · 2019-11-01T11:51:43Z

I still don't understand the reason why you need container_of. Could you explain it for me?

RX14 · 2019-11-01T11:56:43Z

Like, why not just make your list generic and assume that T has the next and prev pointers already, just like Thread::LinkedList does. You don't have to do all this awful macro offsetof casting, the types are just... correct.

Thread::LinkedList already works with Reference types, you could rename Thread::LinkedList and specialize that type using macros to work with Value. Just look at the way Atomic(T) changes it's implementation based on T.

firejox · 2019-11-01T15:52:14Z

@RX14 I want to keep some flexibility, so I use container_of and do not assume that T has next and prev pointer. All things is just for flexibility.

You cannot do that as Atomic(T) does. I have tried before I made this PR. The main problem will occur on the design of push operation. It should be expected to accept Pointer type, not Value type.

Now I have made the code in generic way. It seems that my implementation is not widely accepted by core team.

asterite · 2019-11-01T15:59:00Z

It seems that my implementation is not widely accepted by core team

I personally like the performance improvements but I think the code becomes much harder to follow, with many low-level tricks. My fear is that in a couple of months we won't be able to understand how it all works.

I wish for the performance improvements to happen without introducing that much of a code change.

RX14 · 2019-11-08T21:35:35Z

I want to keep some flexibility, so I use container_of and do not assume that T has next and prev pointer. All things is just for flexibility.

This flexibility - when the type will never be used outside of the stdlib - is not worth the drop in readability and the complication of the list API.

I personally like the performance improvements but I think the code becomes much harder to follow, with many low-level tricks. My fear is that in a couple of months we won't be able to understand how it all works.

I think really the only complicated tricks are in the list implementation. If that gets fixed, then i'm perfectly happy to merge this, pending @bcardiff's input.

bcardiff · 2019-11-11T17:59:16Z

@firejox I made another round on this PR.

Applied a couple of refactors to use linked list specifically for this use and move the shared state to a class instead of a manual Pointer.malloc. These changes are applied in their own commits at https://github.com/bcardiff/crystal/tree/refactor/linked-list

The underlying behavior is the same.

Then I perform the benchmarks you sent at #8322 (comment), with -Dpreview_mt and without (note that not all of them in your original posting use -Dpreview_mt) and I am not able to validate the improvements locally in the channel-close-bench.cr.

bench	base	firejox	bcardiff
MT channel-close-bench.cr	00:10:42.358743895	00:13:56.305350595	00:14:03.869554707
MT channel-select-unwait-bench.cr	00:00:00.686825065	00:00:00.635335868	00:00:00.684841458
MT channel-ping-pong-bench.cr	00:00:07.174520691	00:00:05.834123856	00:00:05.903892141
ST channel-close-bench.cr	00:03:55.545640555	00:03:26.212166600	00:03:20.613478228
ST channel-select-unwait-bench.cr	00:00:00.216716945	00:00:00.048762337	00:00:00.051219858
ST channel-ping-pong-bench.cr	00:00:02.652404693	00:00:02.439848184	00:00:02.458077994

bcardiff - 3c8f69a
firejox - b540ce5
base - b01ce09

Which platform/hardware are you using?
Mine is Darwin Kernel Version 18.6.0 MacBookPro14,3 Intel Core i7 2.9 GHz

Between the refactor and the fix of the potential invalid memory access, I don't mind the slowness in channel-close-bench.cr.

Do you want to add those commits here? Or we can open another PR based on this one and pick it up from there. As you prefer. Again, thanks for all the effort.

firejox · 2019-11-12T13:55:57Z

@bcardiff Interesting. I don't know why it does not yield the improvement. It actually shows the difference on my computer.

Which platform/hardware are you using?
Mine is Darwin Kernel Version 18.6.0 MacBookPro14,3 Intel Core i7 2.9 GHz

My platform is Linux 5.1.7-arch1-1, x86_64, AMD Athlon X2 II 270 3.4 GHz

Do you want to add those commits here? Or we can open another PR based on this one and pick it up from there. As you prefer. Again, thanks for all the effort.

The changes are great. It is more readable. I will merge them later. Thank you.

EDIT: Sorry. I didn't notice that there are two commits. I only looked at the last one before. Another commit can be better. Because the implementation is change, there is no need to use append_to. The simple swap will achieve the same result.

I will pick the last one and make some change on the other one.

bcardiff · 2019-11-12T17:33:31Z

Sure thing @firejox thanks for keep iterating.

src/channel.cr

src/crystal/pointer_linked_list.cr

src/channel.cr

Several changes are applied: 1. use internal storage list to reduce heap allocation 2. move cleaning operations out of the lock section during close 3. remove redundant lock by making internal send/receive return state 4. because of 3, non blocking select can be determined by state 5. the wait/unwait of select action can be done in O(1)

bcardiff

🎉 I have no more feedback. LGTM!

Sija · 2019-11-21T00:06:04Z

src/channel.cr

+    def compare_and_set(cmp : SelectState, new : SelectState) : {SelectState, Bool}
+      @state.compare_and_set(SelectState::Active, SelectState::Done)
+    end


hmm, seems like these arguments (cmp & new) aren't used anywhere... 🤔
Shouldn't this method body be @state.compare_and_set(cmp, new)?

You're right. Fortunately, it is only used on those values 🙈.

See crystal-lang#8322 (review)

See #8322 (review)

ysbaddaden reviewed Oct 13, 2019

View reviewed changes

src/channel.cr Outdated Show resolved Hide resolved

firejox force-pushed the channel-refactor-and-optimize branch from d5a39ab to 5747de8 Compare October 14, 2019 07:56

asterite reviewed Oct 14, 2019

View reviewed changes

sdogruyol added topic:stdlib:concurrency kind:refactor performance labels Oct 15, 2019

firejox force-pushed the channel-refactor-and-optimize branch 2 times, most recently from c6ae669 to b5fffa6 Compare October 16, 2019 04:18

bcardiff reviewed Oct 17, 2019

View reviewed changes

firejox force-pushed the channel-refactor-and-optimize branch from b5fffa6 to 288a200 Compare October 20, 2019 09:50

bcardiff reviewed Oct 21, 2019

View reviewed changes

firejox force-pushed the channel-refactor-and-optimize branch from 288a200 to 031c4ca Compare October 22, 2019 11:53

bcardiff requested review from ysbaddaden and asterite October 23, 2019 13:34

asterite reviewed Oct 24, 2019

View reviewed changes

ysbaddaden mentioned this pull request Oct 25, 2019

Multiple Faulty Behaviors when using Process inside spawn in MT enabled program #8375

Closed

RX14 requested changes Oct 31, 2019

View reviewed changes

firejox force-pushed the channel-refactor-and-optimize branch 2 times, most recently from 432003f to 8916ea6 Compare November 1, 2019 07:29

firejox requested a review from RX14 November 1, 2019 10:12

firejox force-pushed the channel-refactor-and-optimize branch from 8916ea6 to b540ce5 Compare November 1, 2019 14:55

firejox force-pushed the channel-refactor-and-optimize branch from b540ce5 to ba92974 Compare November 13, 2019 04:44

Sija reviewed Nov 13, 2019

View reviewed changes

src/channel.cr Outdated Show resolved Hide resolved

Sija reviewed Nov 13, 2019

View reviewed changes

src/crystal/pointer_linked_list.cr Outdated Show resolved Hide resolved

bcardiff mentioned this pull request Nov 13, 2019

Allow timeout in select statements #8006

Closed

firejox force-pushed the channel-refactor-and-optimize branch from ba92974 to fdb3167 Compare November 13, 2019 16:42

Sija reviewed Nov 13, 2019

View reviewed changes

src/channel.cr Outdated Show resolved Hide resolved

firejox and others added 2 commits November 14, 2019 12:43

Move SelectContext shared state to a class

ba33bee

firejox force-pushed the channel-refactor-and-optimize branch from fdb3167 to ba33bee Compare November 14, 2019 04:43

bcardiff approved these changes Nov 14, 2019

View reviewed changes

bcardiff added this to the 0.32.0 milestone Nov 14, 2019

bcardiff merged commit 7b46a78 into crystal-lang:master Nov 20, 2019

Sija reviewed Nov 21, 2019

View reviewed changes

Sija added a commit to Sija/crystal that referenced this pull request Nov 21, 2019

Fix missing arguments usage

4732698

See crystal-lang#8322 (review)

Sija mentioned this pull request Nov 21, 2019

Fix missing arguments usage in SelectContextSharedState#compare_and_set #8497

Merged

asterite pushed a commit that referenced this pull request Nov 21, 2019

Fix missing arguments usage (#8497)

440cd67

See #8322 (review)

		@@ -0,0 +1,98 @@
		# :nodoc:
		macro container_of(ptr, type, member)

Channel refactor and optimize #8322

Channel refactor and optimize #8322

Conversation

firejox commented Oct 13, 2019

lribeiro commented Oct 13, 2019

asterite commented Oct 13, 2019

firejox commented Oct 14, 2019

jkthorne commented Oct 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

firejox commented Oct 20, 2019

bcardiff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asterite left a comment

Choose a reason for hiding this comment

RX14 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RX14 commented Nov 1, 2019

RX14 commented Nov 1, 2019 • edited Loading

firejox commented Nov 1, 2019

asterite commented Nov 1, 2019

RX14 commented Nov 8, 2019 • edited Loading

bcardiff commented Nov 11, 2019

firejox commented Nov 12, 2019 • edited Loading

bcardiff commented Nov 12, 2019

bcardiff left a comment

Choose a reason for hiding this comment

Sija Nov 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RX14 commented Nov 1, 2019 •

edited

Loading

RX14 commented Nov 8, 2019 •

edited

Loading

firejox commented Nov 12, 2019 •

edited

Loading

Sija Nov 21, 2019 •

edited

Loading