improve performance for binarytrees. immutable structs and parallel reduce #1

Orbots · 2018-11-20T02:58:40Z

This is about 3x faster on my machine now.

…educe

KristofferC · 2018-11-20T15:49:12Z

Does any other language use distributed contributing?

Orbots · 2018-11-20T18:50:14Z

The faster runs take advantage of the multi-core architecture of a single computer. @distributed seemed the quickest way there. You can see in the tables the last 4 columns are cpu load for 4 cores. The julia runs are at 100% for 1 cpu but 0% for the other 3. So you'd want to run these with julia -p 4 ( is that using 4 cores? or 1 + 4? )

Orbots · 2018-11-20T19:53:06Z

I rolled in the Union optimization from the contribution in binarytree-fast.

SimonDanisch · 2018-11-20T20:26:55Z

I'd rather use multi threading, btw ;)
I started a multi threaded & pool'ed solution, borrowing from c++ solution and managed to get roughly the same performance... I'm just a bit confused, by this:

Please don't implement your own custom "arena" or "memory pool" or "free list" - they will not be accepted.

Since the very simple implementation of the "memory pool" in julia is simpler than what C++ with a pool library is doing :D hope they'll still accept it

Orbots · 2018-11-20T20:45:50Z

The comment about no custom pools is a bit at odds with the fast c++ versions. I'm guessing they just mean no custom pool specifically implemented for this problem. You'd probably have better luck getting your pooled solution accepted if you make a new general package for memory pools. Which I highly encourage because I'd like to use such a package :)

I reached for @distributed as multithreading is still flagged as "experimental" in the docs.

SimonDanisch · 2018-11-20T20:53:19Z

It's literally just putting the nodes into a vector :P I likely won't make a package for push!(pool, node) ^^

Orbots · 2018-11-20T21:00:10Z

I reckon that would be considered a custom pool though. A more general memory pool would manage different sized pools, dynamically resizing as needed. Probably with some macro conveniences. A julia implementation would be pretty simple and mostly just wrapping push!(pool, element). Maybe I'll get to it a some point :)

SimonDanisch · 2018-11-20T21:16:49Z

I reached for @distributed as multithreading is still flagged as "experimental" in the docs.

That's just because the interface may or may not change ;) But it works pretty nicely for such problems already

SimonDanisch · 2018-11-20T21:19:01Z

Btw, this is the implementation:

# The Computer Language Benchmarks Game
# https://salsa.debian.org/benchmarksgame-team/benchmarksgame/

# contributed by Jarret Revels and Alex Arslan
# based on an OCaml program
# *reset*
using Printf
struct Node
    left::Int
    right::Int
end
function alloc!(pool, left, right)
    push!(pool, Node(left, right))
    return length(pool)
end
function make(pool, d)
    d == 0 && return 0
    alloc!(pool, make(pool, d - 1), make(pool, d - 1))
end
check(pool, t::Node) = 1 + check(pool, t.left) + check(pool, t.right)
function check(pool, node::Int)
    node == 0 && return 1
    @inbounds return check(pool, pool[node])
end
function threads_inner(pool, d, min_depth, max_depth)
    niter = 1 << (max_depth - d + min_depth)
    c = 0
    for j = 1:niter
        c += check(pool, make(pool, d))
        empty!(pool)
    end
    @sprintf("%i\t trees of depth %i\t check: %i\n", niter, d, c)
end
function loop_depths(io, d, min_depth, max_depth)
    output = ntuple(x-> String[], Threads.nthreads())
    Threads.@threads for d in min_depth:2:max_depth
        pool = Node[]
        push!(output[Threads.threadid()], threads_inner(pool, d, min_depth, max_depth))
    end
    foreach(s->foreach(x->print(io, x), s), output)
end
function perf_binary_trees(io, N::Int=10)
    min_depth = 4
    max_depth = N
    stretch_depth = max_depth + 1
    pool = Node[]
    # create and check stretch tree
    let c = check(pool, make(pool, stretch_depth))
        @printf(io, "stretch tree of depth %i\t check: %i\n", stretch_depth, c)
    end

    long_lived_tree = make(pool, max_depth)

    loop_depths(io, min_depth, min_depth, max_depth)
    @printf(io, "long lived tree of depth %i\t check: %i\n", max_depth, check(pool, long_lived_tree))

end
 perf_binary_trees(stdout, 21)

Orbots · 2018-11-20T22:35:46Z

Nice. I'm seeing about 9x better perf than the original.

ChristianKurz · 2018-12-19T12:49:42Z

@Orbots please don't use julia macros directly in a GitHub discussion. This will ping the GitHub-user with that name. Just use some code fences to prevent this e.g. @distributed.

distributed · 2018-12-19T16:38:43Z

Thanks! As github user distributed I very much appreciate this:D

…

Sent from my iPhone

On 19 Dec 2018, at 13:49, Christian Kurz ***@***.***> wrote: @Orbots please don't use julia macros directly in a GitHub discussion. This will ping the GitHub-user with that name. Just use some code fences to prevent this e.g. @distributed. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Orbots · 2018-12-19T17:46:57Z

Right, of course. That's funny. Github user 'time' must be regretting his choice of user names now :)

improve performance for binarytrees. immutable structs and parallel r…

4642e2e

…educe

add optimization using Union of types for children.

239f8c4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve performance for binarytrees. immutable structs and parallel reduce #1

improve performance for binarytrees. immutable structs and parallel reduce #1

Orbots commented Nov 20, 2018

KristofferC commented Nov 20, 2018 •

edited

Loading

Orbots commented Nov 20, 2018

Orbots commented Nov 20, 2018

SimonDanisch commented Nov 20, 2018

Orbots commented Nov 20, 2018

SimonDanisch commented Nov 20, 2018

Orbots commented Nov 20, 2018

SimonDanisch commented Nov 20, 2018

SimonDanisch commented Nov 20, 2018

Orbots commented Nov 20, 2018

ChristianKurz commented Dec 19, 2018

distributed commented Dec 19, 2018 via email

Orbots commented Dec 19, 2018

improve performance for binarytrees. immutable structs and parallel reduce #1

Are you sure you want to change the base?

improve performance for binarytrees. immutable structs and parallel reduce #1

Conversation

Orbots commented Nov 20, 2018

KristofferC commented Nov 20, 2018 • edited Loading

Orbots commented Nov 20, 2018

Orbots commented Nov 20, 2018

SimonDanisch commented Nov 20, 2018

Orbots commented Nov 20, 2018

SimonDanisch commented Nov 20, 2018

Orbots commented Nov 20, 2018

SimonDanisch commented Nov 20, 2018

SimonDanisch commented Nov 20, 2018

Orbots commented Nov 20, 2018

ChristianKurz commented Dec 19, 2018

distributed commented Dec 19, 2018 via email

Orbots commented Dec 19, 2018

KristofferC commented Nov 20, 2018 •

edited

Loading