Boxed variables in closures passed to a Task can cause race condition #30896

NHDaly · 2019-01-29T22:21:13Z

The way we spawn Tasks via @async can lead to a surprising race condition b/c variables inside the closure are sometimes boxed.
This can make it difficult to spawn a Task that needs to capture a local variable.

Consider this simple example:

function countup()
    i = 0
    while i < 5
        @async println(i)
        i += 1
    end
end

julia> countup()
5
5
5
5
5

And the problem is nothing to do with the @async macro itself; you'd see the same problem using the Task constructor, Task(()->println(i)), since in both cases the behavior is related to boxing inside the closure.

And since we require you to pass a zero-argument function, capturing variables with closures seems like the solution you're supposed to use.

I just checked, and Go's closures behave the same way. However, in Go they skirt this problem because the syntax to launch a goroutine is required to be a function call, so you have the opportunity to pass arguments to your constructed coroutine. In go, the above example works correctly:

func main() {	
	i := 0
	for ; i < 5; {
	   go fmt.Println(i)   // Since `go` statements require _you_ to call the function, you don't need to rely on capture. This is a _call_ to the existing Println function.
	   i += 1
	}
	time.Sleep(time.Second)
}
// prints:
0
1
2
3
4

And even if you do need to construct a closure in go, you can still pass the pass-by-copy parameters directly:

	   go func(i int) { fmt.Println(i); }(i)

But currently we don't have a way to start a Task with a parameter. The current Task constructor requires its argument to be "callable with no arguments".

Currently the only way I can think of to mitigate this in my code is to spawn the Task with @eval, since it gives you the chance to splice in the value of x, thus forcing the pass-by-copy syntax. But that's obviously terrible for a bunch of other reasons.

So I think we need some kind of alternative solution here. What do you think?

Here are a few proposals I can think of:

For Task(f), allow passing a function that takes arguments.
- Then, when you schedule it, calling schedule(t, x,y,z) would allow you to pass the arguments to f, spawning f(x,y,z) in a coroutine.
- The problem with this is that schedule(t, val) already has a definition:
  https://docs.julialang.org/en/v1.1.0/base/parallel/#Base.schedule
  (I don't really understand how to use that, though, and the docs say use of yieldto "is discouraged.")
We could consider allowing @async to use the $ interpolation syntax within its expression body to indicate pass-by-value vs pass-by-reference semantics. This would be kind of like what @btime does.
In my example, that would become:
```
    @async println($i)
```
~~Sadly this might be breaking, though, for any expressions that already contained quoting and interpolation inside the @async body. 😦 (e.g. @async println(:($x + 5)))~~
EDIT: I no longer think this would be breaking in any cases... In the provided example, the $ is clearly interpolating into the quote, not into @async, following the same rules as in @eval.
We could introduce some kind of explicit capture syntax for closures that includes whether to pass-by-copy or -by-reference, maybe like that proposed here:
Explicit capture of variables in a closure. #14959 (comment)
But i tend to agree that it complicates things in an undesirable way.

Anyone have any other ideas?

EDIT: Also, I opened this discourse thread to get more background info on why the boxing is needed 🙂:
https://discourse.julialang.org/t/surprising-capture-boxing-behavior-in-closure/20254

If you're interested, I came across this behavior when translating this cool prime number sieve written in Go into Julia:
http://tinyurl.com/gosieve

Here's the julia code I wrote:

"""
    ConcurrentPrimeSieve

Based on http://tinyurl.com/gosieve
"""
module ConcurrentPrimeSieve

# Send the sequence 2, 3, 4, ... to channel 'ch'.
function Generate(ch::Channel{Int})
    for i in Iterators.countfrom(2)
        put!(ch, i) # Send 'i' to channel 'ch'.
    end
end

# Copy the values from channel 'in' to channel 'out',
# removing those divisible by 'prime'.
function Filter(in::Channel{Int}, out::Channel{Int}, prime::Int)
    while true
        i = take!(in) # Receive value from 'in'.
        if i%prime != 0
            put!(out, i) # Send 'i' to 'out'.
        end
    end
end

# The prime sieve: Daisy-chain Filter processes.
function main()
    ch = Channel{Int}(0) # Create a new channel.
    @async Generate(ch)  # Launch Generate goroutine.
    for i in 1:10
        prime = take!(ch)
        println(prime)
        # THIS IS BROKEN (race condition):
        # The `ch` binding inside this async expression _changes_ before it executes!
        ch1 = Channel{Int}(0)
        @async Filter(ch, ch1, prime)
        ch = ch1
    end
end

end

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2019-01-29T22:28:46Z

The solution to this is to use let to create a new variable binding for each closure:

function countup()
    i = 0
    while i < 5
        let i=i
            @async println(i)
        end
        i += 1
    end
end

NHDaly · 2019-01-29T22:50:14Z

Ah, thanks Jeff, that makes sense. That's a working solution! :)

That said, it's a bit of a bummer. It's both more complicated than the equivalent go code, and also I would think pretty non-obvious to the casual Julia user. :/

It would be nice to make spawning tasks more friendly for this kind of use case.

NHDaly · 2019-01-29T22:54:36Z

Maybe another solution would be a different macro that requires its argument expression to be a :call, and does the above let end automatically.

Maybe like @async_call or something. This would more closes match the go statement in Go, in that it would allow the simple-looking behavior above:

function countup()
    i = 0
    while i < 5
        @go println(i)
        i += 1
    end
end
countup()

Something like this, I guess:

macro go(expr)
    @assert expr.head == :call
    f = expr.args[1]
    args = expr.args[2:end]

    copyargs = [gensym(string(a)) for a in args]
    letargs = [:($a=$(esc(b))) for (a,b) in zip(copyargs,args)]
    quote
        let $(letargs...)
            @async $f($(copyargs...))
        end
    end
end

(I wouldn't actually want to call it @go 😋.)
That wouldn't help with directly constructing a Task, but for most cases you don't need to.

JeffBezanson · 2019-01-29T23:00:11Z

That's clever; I do kind of like it. The interesting thing is that then the argument expressions are evaluated in the current task instead of in the new created task. I assume that's what go does too?

Another option is to add arguments to @async, e.g. @async i println(i) to tell it what bindings to copy... or not copy?

NHDaly · 2019-01-30T15:11:28Z

The interesting thing is that then the argument expressions are evaluated in the current task instead of in the new created task.

Yeah, exactly. Since this is how regular function calls work (you evaluate the args before entering the function) this seems pretty natural. It's a bit counter to the way most macros behave though, so i can also see this behavior being unexpected.. For that reason it seems like a macro like this should explicitly have call in its name.

I assume that's what go does too?

I'm not a go expert (i've been learning Go only because there's a wealth of good concurrency information in Go), but my understanding is yes, that's what go does too.

Another option is to add arguments to @async, e.g. @async i println(i) to tell it what bindings to copy... or not copy?

Yeah, that feels reasonable, too. It's a bit awkward, but not terrible. Although that's almost starting to look like a lambda syntax again.. i -> println(i) But actually maybe that's a good thing?

NHDaly · 2019-01-31T17:44:50Z

Just to add more context to this discussion, here's the exact same ideas discussed for Go:

https://github.com/golang/go/wiki/CommonMistakes#using-goroutines-on-loop-iterator-variables

Their recommended solution is to pass the value as a parameter to the anonymous function, similar to what I did in my example @go macro above.

NHDaly · 2019-08-30T04:14:15Z

I'm finally following up with this again, now that we have a multithreading @spawn primitive. I find having an explicit name like @spawncall to be much less cryptic than @spawn i println(i), so I've gone ahead with that.

I've opened #33119 to implement @spawncall.

NHDaly mentioned this issue Aug 30, 2019

Adds interpolation syntax to Threads.@spawn, to evaluate arguments immediately #33119

Merged

JeffBezanson closed this as completed in #33119 Dec 18, 2019

NHDaly mentioned this issue Dec 19, 2019

Race condition caused by variable scope getting lifted from a multithreaded context #14948

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boxed variables in closures passed to a Task can cause race condition #30896

Boxed variables in closures passed to a Task can cause race condition #30896

NHDaly commented Jan 29, 2019 •

edited

Loading

JeffBezanson commented Jan 29, 2019

NHDaly commented Jan 29, 2019

NHDaly commented Jan 29, 2019

JeffBezanson commented Jan 29, 2019

NHDaly commented Jan 30, 2019 •

edited

Loading

NHDaly commented Jan 31, 2019

NHDaly commented Aug 30, 2019

Boxed variables in closures passed to a Task can cause race condition #30896

Boxed variables in closures passed to a Task can cause race condition #30896

Comments

NHDaly commented Jan 29, 2019 • edited Loading

JeffBezanson commented Jan 29, 2019

NHDaly commented Jan 29, 2019

NHDaly commented Jan 29, 2019

JeffBezanson commented Jan 29, 2019

NHDaly commented Jan 30, 2019 • edited Loading

NHDaly commented Jan 31, 2019

NHDaly commented Aug 30, 2019

NHDaly commented Jan 29, 2019 •

edited

Loading

NHDaly commented Jan 30, 2019 •

edited

Loading