add experimental `@spawn` macro to Base.Threads #32600

JeffBezanson · 2019-07-16T20:09:05Z

The code for this is already starting to get around so we might as well add it before things get further out of hand :) Open for bikeshedding.

bramtayl · 2019-07-16T21:01:52Z

What does par stand for?

JeffBezanson · 2019-07-16T21:03:20Z

parallel

bramtayl · 2019-07-16T21:05:35Z

Maybe @parallel then?

chethega · 2019-07-16T21:24:16Z

The current docs and naming conventions use "parallel" more in the "distributed" sense, e.g. pmap. In that sense, @threaded would fit with current names. Alternatively, @schedule would also make sense.

JeffBezanson · 2019-07-16T21:33:38Z

We have a schedule function but it was and remains single-threaded, so that word is less of a perfect match. We have been phasing out use of "parallel" for distributed stuff, though it does remain e.g. as the p in pmap.

mbauman · 2019-07-16T22:35:15Z

What's the thought on the future of creating a threaded for loop? I want to also have a macro that smartly divides the iteration space up into a "smart" number of partr tasks. Will we be able to silently move the @threads macro to that new implementation? If so, I think it'd make sense to name this @thread (there's just one task created/it's a verb).

Even if we deem changing @threads to use partr tasks to be breaking, this is such a nice pair of names that it could make sense to simply use a new module name (perhaps Threading or Multithreading) to hold the new behaviors.

vchuravy · 2019-07-16T22:42:04Z

Even if we deem changing @threads to use partr tasks to be breaking,

@threads already uses tasks.

The name @thread doesn't quite fit since this creates a Task that migrates between Workerthreads. I still like @async par=true, especially since that still makes the @sync @async construct work.
Other words are @fiber, @detach (although I was kinda planning to use that for Tapir).

JeffBezanson · 2019-07-16T22:46:15Z

@async par=true

Ok, or @par for short 😂

Raises a good question though. @async par=true should clearly participate in @sync, but should we have a version (e.g. @par) that doesn't?

vchuravy · 2019-07-16T22:51:56Z

Raises a good question though. @async par=true should clearly participate in @sync, but should we have a version (e.g. @par) that doesn't?

You mean a version that the runtime is not required to ever execute until the end of a program ? :P I like those.
I think it is important to establish the notion of when work in @par tasks are making forward progress, and encouraging the usage of @sync with @parwould make sense.
Cilk has the notion that all spawned functions synchronize at the function boundary to make reasoning about forward progress easier.

JeffBezanson · 2019-07-16T23:13:08Z

Obviously, we should call it @go. Or I guess, by analogy, @julia.

Keno · 2019-07-16T23:33:05Z

Obviously, we should call it @go

Frankly, I don't think this would be horrible.

ViralBShah · 2019-07-16T23:48:37Z

I kind of like @go.

tkf · 2019-07-17T00:48:50Z

Does it make sense to use shell analogy and use @& or @bg?

vtjnash · 2019-07-17T01:35:25Z

Actually, yes, @go seems really good.

StefanKarpinski · 2019-07-17T17:02:06Z

Really? Are we going to call them "goroutines" also?

StefanKarpinski · 2019-07-17T17:13:34Z

It's feeling to me like we're getting into some fairly ad hoc terminology choices and thinking systematically about the parallels (get it?) between different levels of parallelism here might be helpful:

concurrency (I/O)
multithreading
distributed

We might want to think about what the final set of terminology we ideally want would be, even if it involves changing or reclaiming some terms in 2.0. We use @sync and @async for concurrency, we're proposing using @par/@go, @threads and fetch for multithreading and we use @spawn and fetch for distributed. Could we rationalize this situation a bit more?

JeffBezanson · 2019-07-17T17:40:56Z

wait and fetch are common to all of them.

StefanKarpinski · 2019-07-17T18:12:28Z

Ok, I know how much you love tables @JeffBezanson, but this feels like it calls for one:

	concurrency	multithreading	distributed
unit of work	sticky `Task`	non-sticky `Task`	`Future`
create work	`@async`	`@par`, `@go`, ...	`@spawn`, `@spawnat`
synchronize units	`@sync`	?	`@sync`
communication	`Channel`	`Channel`	`RemoteChannel`
wait for result	`wait`	`wait`	`wait`
get result	`fetch`	`fetch`	`fetch`

Any other rows I should add? Even if this doesn't inform our decision, it makes me feel better and seems like a good way to communicate to the user what things are called and what they do.

JeffBezanson · 2019-07-17T18:19:12Z

@spawn and @spawnat also hook in to @sync, so that box can be filled in.

JeffBezanson · 2019-07-17T18:25:22Z

Also I like @vchuravy 's view that this could be @async threads=true (just a variant of async that allows running on any thread). I'd simply want a shorthand for it.

StefanKarpinski · 2019-07-17T18:38:09Z

I updated the upper right corner from "remote Task" to Future because the thing you get when you call @spawn is a Future. Which makes me wonder if we'd want to consider renaming Future to RemoteTask (keeping Future as a compatibility alias until 2.0).

The thing that feels a bit out of sync (see?) is that @sync and @async naturally pair, but not so much with @spawn, @spawnat, @par or @go. If @par/@go was shorthand for @async parallel=true that would help a bit, but I'm afraid that having more than one way to write it doesn't really help reduce confusion much. It's also a bit unclear to me why @async is the one called "async"—they're all asynchronous. I'm mulling over these variations:

concurrent, not parallel, not remote: @async
concurrent, parallel, not remote: @async thread=true or @async parallel=true
concurrent, parallel, remote: @async remote=true
concurrent, parallel, remote on specific worker: @async remote=3 or @async worker=3

The problem with keywords is that they imply orthogonality, and it doesn't seem like this is orthogonal at all: it's a scale from sequential (nothing) to concurrent, to parallel/multithreaded, to distributed, which makes me wonder if it wouldn't be ok to just call these:

@async
@async_thread or @async_parallel
@async_remote

Of course, that violates our underscore policy, but it seems a bit clearer that these are similar and that they all pair with @sync.

StefanKarpinski · 2019-07-17T18:52:29Z

In particular what's concerning about the terms "parallel" (and its abbreviation "par") and "spawn" is that they are too generic: all of these things are parallel in some sense and "spawn" is a term you can use to describe starting any of these units of work. So using parallel or "par" for one and spawn for another just seems arbitrary. Cilk, for example, uses cilk_spawn for what we're proposing to call @par or @go and clik_for for what we're calling @threads for.

StefanKarpinski · 2019-07-17T19:02:33Z

I kind of like @async_thread and @async_remote. Going with that renaming Future to RemoteTask we would have the following table:

	concurrency	multithreading	distributed
unit of work	sticky `Task`	non-sticky `Task`	`RemoteTask`
create work	`@async`	`@async_thread`	`@async_remote`
synchronize units	`@sync`	`@sync`	`@sync`
communication	`Channel`	`Channel`	`RemoteChannel`
wait for result	`wait`	`wait`	`wait`
get result	`fetch`	`fetch`	`fetch`

Which looks pretty clean and consistent to me.

JeffBezanson · 2019-07-17T19:10:19Z

On a technical level I totally agree; it's just that @async_thread has a gnawing ugliness.

StefanKarpinski · 2019-07-17T19:12:31Z

Eh, it's not beautiful but it doesn't seem all that bad to me.

quinnj · 2019-07-17T19:23:01Z

@asyncthread and @asyncremote seem better to me and still legible.

mbauman · 2019-07-17T19:24:39Z

I still very much like the verbs @thread (nee @par) and @distribute (nee @spawn; edit: hrm, this isn't quite right as the word implies splitting things up) to describe creating work that can be moved onto worker threads or processes. The adjectives @threaded (nee @threads) and @distributed can be how we modify for loops — a name change would allow us to unify their syncing behaviors, too (currently @threads for syncs but @distributed for does not).

I don't find it all that odd that @sync can apply to things without async in their name and that's one of the least challenging things to teach with respect to @distributed/@spawn currently.

JeffBezanson · 2019-07-17T19:40:32Z

I think the current distinction between @async and the new threaded thing is not essential; we've been thinking of it as mostly for backwards compatibility. We don't want to inject thread-safety concerns into all code currently using @async. But it would be nice to just have one of them in the future. @asyncthread also comes across to me as kind of jargony, and a bit of a mouthful.

vchuravy · 2019-07-17T20:01:34Z

I updated the upper right corner from "remote Task" to Future because the thing you get when you call @spawn is a Future. Which makes me wonder if we'd want to consider renaming Future to RemoteTask (keeping Future as a compatibility alias until 2.0).

The difference being that a Future is the promise of a result (and that's what you get from @spawn), whereas a RemoteTask implies that you a get a proper handle to the remote task and could throw an exception to it as an example.

I think the big issue we have is that we want to keep backwards-compatibility. Otherwise I would make @async the default (non-sticky) and add an API for adding creating sticky tasks.

JeffBezanson · 2019-07-18T20:36:57Z

We discussed this on the triage call and brainstormed many possibilities. I'll summarize the front-runners:

@spawn. This seems to be the closest to a standard term for this, used by Cilk, Erlang, and TBB. I think the current Distributed.@spawn is not very useful and could eventually be deprecated (possible replacement: @spawnat any ...).
@start. A very good suggestion by @NHDaly . Has the connotation that @start f(x) just starts running f(x), not necessarily finishing it yet, which is quite right. A small downside is that Threads.@start sounds like it starts a thread, but this doesn't necessarily need to be in the Threads module (though I really think it belongs there).
@go. A nice short word, and our model is really very similar to go's. However, this is clearly very strongly associated with the go language.

EDIT: added @do by request

Might as well make this a poll (non-binding!):
🎉 @spawn
🚀 @start
😄 @go
😕 @do

jebej · 2019-07-18T21:23:36Z

@thread & @process? It would be nice for both terms to be clear as to what type of parallel execution they use. From what I understand, @spawn is used both for threads (Cilk) and processes (Erlang), and so would be ambiguous when unqualified, as noted by @chethega .

JeffBezanson · 2019-07-18T21:26:11Z

I think @thread is ok for this, but some dislike that it does not actually mean starting a thread (it starts a task, which might run on the same thread). But it pairs nicely with the existing @threads (which uses multiple threads for an operation).

jebej · 2019-07-18T21:43:51Z

Personally I don't understand @thread to mean starting a thread, just running on one. As in @thread fun() meaning "thread this function call"; although I see how the meaning of the verb "to thread" might be up for debate in this context.

c42f · 2019-07-19T03:34:07Z

@start is a good option because it invites the user to ask "but when does my task end?"

This seems to be the vital question which can be hard to understand and is answered by structured concurrency libraries in other languages. A lot of design discussion on the Trio forum seems to be about the difficulties of robust cancellation.

Also, it would be nice not to use up verbs which we could want later for a more structured approach. Having something like @par to experiment with is great but we're early in the game here :-)

tkf · 2019-07-19T04:22:17Z

Then maybe @go is very appropriate, if Julia is going to take the stance of "go statement considered harmful" later?

JeffBezanson · 2019-07-19T04:25:58Z

I don't consider it harmful. I mean, many people consider dynamic typing harmful...

IanButterworth · 2019-07-19T05:04:04Z

How about @spread. In the viral/sandwich filling sense.

c42f · 2019-07-19T05:29:58Z

I'm not sure about @go being harmful, but being able to reason about child task lifetimes by just inspecting the code locally seems like a great thing to aim for.

raminammour · 2019-07-19T20:33:40Z

This is probably me being silly on a Friday afternoon, but just for fun, how about @braid?

Braid (the noun) is a synonym of thread, and braid (the verb) conveys the interlacing of threads (parallel depth first, of course 😂). Which I am assuming this PR supports with the PARTR scheduler?

Cheers!

felipenoris · 2019-07-19T21:54:21Z

I kind of like @async_thread and @async_remote. Going with that renaming Future to RemoteTask we would have the following table:

concurrency multithreading distributed
unit of work sticky Task non-sticky Task RemoteTask
create work @async @async_thread @async_remote
synchronize units @sync @sync @sync
communication Channel Channel RemoteChannel
wait for result wait wait wait
get result fetch fetch fetch
Which looks pretty clean and consistent to me.

I like the verbosity in @async, @async_thread, @async_remote for concurrency, threads and distributed. This is a self explanatory naming and makes the user think about it when using.

"@async_thread" starts a thread. Do you know what you're going?

and not

Type "@go something" and don't worry about the details.

Hope this makes sense to someone.

JeffBezanson · 2019-07-20T02:19:46Z

After sleeping on it, I think we are going to go with @spawn. That seems to be the closest thing to a standard name for this, which carries a lot of weight for me. In particular, it has been used in multiple systems for features like this, but not necessarily with the exact same semantics in each case. So it's a kind of usefully-vague term like "object" or "function" --- you know what it is but you know you'll need to read the manual for details.

samuelpowell · 2019-07-20T07:08:29Z

What about @fork?

I appreciate this has OS and fork/join connotations but it is a nice description of what happens to an otherwise linear path of execution.

StefanKarpinski · 2019-07-21T02:07:03Z

@samuelpowell, it's a good suggestion and was discussed on Thursday's triage call. Although the parallelism model that's implemented is sometimes called fork-join you'll note that every system that implements this model calls the the operation of forking a new unit of work "spawn".

samuelpowell · 2019-07-21T07:32:49Z

I understand, thanks @StefanKarpinski

smldis · 2019-07-21T08:20:10Z

Which looks pretty clean and consistent to me.

Would we be able to do non-sticky RemoteTasks?
Is the concept of Task portable to other kind of workers like having a CUDAnativeTask?

chethega · 2019-07-22T10:10:11Z

As far as I understood, @spawn foo(bar) without wait or lexically enclosing @sync does not need to ever run. This is a big footgun, because @sync f(x) with @inline f(x)=begin @spawn foo(x); nothing end creates such a zombie task.

It would be nice to explicitly warn users about this, both in the docs and with a debug-option (if a Task could be garbage collected after it has been scheduled and before it has run, then it is dead code, most likely due to user error. Afaiu the gc scans the task heap, so this never happens; but theoretically we could use weakrefs in debug builds and warn users when we finalize a non-finished scheduled task; more brutally, we could warn whenever we finalize a finished task that has not been waited on).

JeffBezanson · 2019-07-22T14:37:15Z

when we finalize a non-finished scheduled task

How can that happen? If it's scheduled then there's a reference to it and it won't be finalized.

more brutally, we could warn whenever we finalize a finished task that has not been waited on

A bit too brutal for me :) Of course, that has been requested before for tasks that terminated with an error, which makes some sense at least.

All this worry about tasks possibly never running is a bit overblown. It's not like we will just randomly decide not to run a task for no reason. As long as you hit some yield points or have an available thread before the process exits, it will run.

chethega · 2019-07-22T15:27:01Z

All this worry about tasks possibly never running is a bit overblown.

I don't worry about tasks that never run. I worry that users forget to wait on a task, don't see this during testing and encounter bad races in prod. Running a task "whenever, next year is soon enough" is almost never desired by users, and therefore almost surely indicates a bug/race in user code. This will be a really common type of mistake.

I am thinking about how we can help users detect such mistakes.

One could detect such a condition by having a has_waited bit in the task struct that is initially 0, and is set to 1 when someone waits on it. Users that really mean "next year is fine by me" can set the has_waited-bit on task creation, before scheduling. Then one could, when enqueuing an unwaited-task, add a finalizer that checks whether the task has been waited on, and otherwise warns/errors. This would indicate a situation where running a task next year would have been valid, which is almost surely a user error (nobody holds a ref to the task, hence nobody can wait on it in the future). This leaves a possibility of false positives: If someone still holds a weakref to an unwaited but finished task, then we would issue a bogus warning (someone could have used the weakref to wait on the task in the future). Possible solution: Creating a weakref to a task sets the has_waited-bit.

If it's scheduled then there's a reference to it and it won't be finalized.

It would require a compile option that uses weakrefs in the heaps. But upon reconsideration, the other option (as outllined above) is better, and this was a stupid idea.

StefanKarpinski · 2019-07-22T20:19:31Z

I'll repeat my pitch for tree-structured I/O a la trio. Not for this PR but for the future.

JeffBezanson added the multithreading Base.Threads and related functionality label Jul 16, 2019

StefanKarpinski mentioned this pull request Jul 18, 2019

macro threadcall does not accept Ref arguments #30864

Closed

JeffBezanson force-pushed the jb/par branch from b63ba85 to 53f717f Compare July 20, 2019 20:19

JeffBezanson changed the title ~~add experimental @par macro to Base.Threads~~ add experimental @spawn macro to Base.Threads Jul 20, 2019

StefanKarpinski mentioned this pull request Jul 20, 2019

compiler warning in cgmemmgr.cpp #32640

Closed

JeffBezanson force-pushed the jb/par branch from 53f717f to c56e25f Compare July 22, 2019 18:41

add experimental @spawn macro to Base.Threads

84357b6

JeffBezanson force-pushed the jb/par branch from c56e25f to 84357b6 Compare July 22, 2019 21:02

JeffBezanson merged commit c86700d into master Jul 22, 2019

delete-merged-branch bot deleted the jb/par branch July 22, 2019 21:02

jyjemily mentioned this pull request May 29, 2023

NEWS 수정 juliakorea/translate-doc#32

Open

add experimental @spawn macro to Base.Threads #32600

add experimental @spawn macro to Base.Threads #32600

Conversation

JeffBezanson commented Jul 16, 2019

bramtayl commented Jul 16, 2019

JeffBezanson commented Jul 16, 2019

bramtayl commented Jul 16, 2019

chethega commented Jul 16, 2019

JeffBezanson commented Jul 16, 2019

mbauman commented Jul 16, 2019

vchuravy commented Jul 16, 2019 • edited Loading

JeffBezanson commented Jul 16, 2019

vchuravy commented Jul 16, 2019 • edited Loading

JeffBezanson commented Jul 16, 2019

Keno commented Jul 16, 2019

ViralBShah commented Jul 16, 2019

tkf commented Jul 17, 2019

vtjnash commented Jul 17, 2019

StefanKarpinski commented Jul 17, 2019

StefanKarpinski commented Jul 17, 2019 • edited Loading

JeffBezanson commented Jul 17, 2019

StefanKarpinski commented Jul 17, 2019 • edited Loading

JeffBezanson commented Jul 17, 2019

JeffBezanson commented Jul 17, 2019

StefanKarpinski commented Jul 17, 2019 • edited Loading

StefanKarpinski commented Jul 17, 2019

StefanKarpinski commented Jul 17, 2019

JeffBezanson commented Jul 17, 2019

StefanKarpinski commented Jul 17, 2019 • edited Loading

quinnj commented Jul 17, 2019

mbauman commented Jul 17, 2019 • edited Loading

JeffBezanson commented Jul 17, 2019

vchuravy commented Jul 17, 2019

JeffBezanson commented Jul 18, 2019 • edited Loading

jebej commented Jul 18, 2019

JeffBezanson commented Jul 18, 2019 • edited Loading

jebej commented Jul 18, 2019

c42f commented Jul 19, 2019 • edited Loading

tkf commented Jul 19, 2019

JeffBezanson commented Jul 19, 2019

IanButterworth commented Jul 19, 2019

c42f commented Jul 19, 2019 • edited Loading

raminammour commented Jul 19, 2019

felipenoris commented Jul 19, 2019

JeffBezanson commented Jul 20, 2019

samuelpowell commented Jul 20, 2019 • edited Loading

StefanKarpinski commented Jul 21, 2019

samuelpowell commented Jul 21, 2019

smldis commented Jul 21, 2019 • edited Loading

chethega commented Jul 22, 2019 • edited Loading

JeffBezanson commented Jul 22, 2019

chethega commented Jul 22, 2019

StefanKarpinski commented Jul 22, 2019

add experimental `@spawn` macro to Base.Threads #32600

add experimental `@spawn` macro to Base.Threads #32600

vchuravy commented Jul 16, 2019 •

edited

Loading

vchuravy commented Jul 16, 2019 •

edited

Loading

StefanKarpinski commented Jul 17, 2019 •

edited

Loading

StefanKarpinski commented Jul 17, 2019 •

edited

Loading

StefanKarpinski commented Jul 17, 2019 •

edited

Loading

StefanKarpinski commented Jul 17, 2019 •

edited

Loading

mbauman commented Jul 17, 2019 •

edited

Loading

JeffBezanson commented Jul 18, 2019 •

edited

Loading

JeffBezanson commented Jul 18, 2019 •

edited

Loading

c42f commented Jul 19, 2019 •

edited

Loading

c42f commented Jul 19, 2019 •

edited

Loading

samuelpowell commented Jul 20, 2019 •

edited

Loading

smldis commented Jul 21, 2019 •

edited

Loading

chethega commented Jul 22, 2019 •

edited

Loading