Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve performance for binarytrees. immutable structs and parallel reduce #1

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Orbots
Copy link

@Orbots Orbots commented Nov 20, 2018

This is about 3x faster on my machine now.

@KristofferC
Copy link
Collaborator

KristofferC commented Nov 20, 2018

Does any other language use distributed contributing?

@Orbots
Copy link
Author

Orbots commented Nov 20, 2018

The faster runs take advantage of the multi-core architecture of a single computer. @distributed seemed the quickest way there. You can see in the tables the last 4 columns are cpu load for 4 cores. The julia runs are at 100% for 1 cpu but 0% for the other 3. So you'd want to run these with julia -p 4 ( is that using 4 cores? or 1 + 4? )

@Orbots
Copy link
Author

Orbots commented Nov 20, 2018

I rolled in the Union optimization from the contribution in binarytree-fast.

@SimonDanisch
Copy link
Collaborator

I'd rather use multi threading, btw ;)
I started a multi threaded & pool'ed solution, borrowing from c++ solution and managed to get roughly the same performance... I'm just a bit confused, by this:

Please don't implement your own custom "arena" or "memory pool" or "free list" - they will not be accepted. 

Since the very simple implementation of the "memory pool" in julia is simpler than what C++ with a pool library is doing :D hope they'll still accept it

@Orbots
Copy link
Author

Orbots commented Nov 20, 2018

The comment about no custom pools is a bit at odds with the fast c++ versions. I'm guessing they just mean no custom pool specifically implemented for this problem. You'd probably have better luck getting your pooled solution accepted if you make a new general package for memory pools. Which I highly encourage because I'd like to use such a package :)

I reached for @distributed as multithreading is still flagged as "experimental" in the docs.

@SimonDanisch
Copy link
Collaborator

It's literally just putting the nodes into a vector :P I likely won't make a package for push!(pool, node) ^^

@Orbots
Copy link
Author

Orbots commented Nov 20, 2018

I reckon that would be considered a custom pool though. A more general memory pool would manage different sized pools, dynamically resizing as needed. Probably with some macro conveniences. A julia implementation would be pretty simple and mostly just wrapping push!(pool, element). Maybe I'll get to it a some point :)

@SimonDanisch
Copy link
Collaborator

I reached for @distributed as multithreading is still flagged as "experimental" in the docs.

That's just because the interface may or may not change ;) But it works pretty nicely for such problems already

@SimonDanisch
Copy link
Collaborator

Btw, this is the implementation:

# The Computer Language Benchmarks Game
# https://salsa.debian.org/benchmarksgame-team/benchmarksgame/

# contributed by Jarret Revels and Alex Arslan
# based on an OCaml program
# *reset*
using Printf
struct Node
    left::Int
    right::Int
end
function alloc!(pool, left, right)
    push!(pool, Node(left, right))
    return length(pool)
end
function make(pool, d)
    d == 0 && return 0
    alloc!(pool, make(pool, d - 1), make(pool, d - 1))
end
check(pool, t::Node) = 1 + check(pool, t.left) + check(pool, t.right)
function check(pool, node::Int)
    node == 0 && return 1
    @inbounds return check(pool, pool[node])
end
function threads_inner(pool, d, min_depth, max_depth)
    niter = 1 << (max_depth - d + min_depth)
    c = 0
    for j = 1:niter
        c += check(pool, make(pool, d))
        empty!(pool)
    end
    @sprintf("%i\t trees of depth %i\t check: %i\n", niter, d, c)
end
function loop_depths(io, d, min_depth, max_depth)
    output = ntuple(x-> String[], Threads.nthreads())
    Threads.@threads for d in min_depth:2:max_depth
        pool = Node[]
        push!(output[Threads.threadid()], threads_inner(pool, d, min_depth, max_depth))
    end
    foreach(s->foreach(x->print(io, x), s), output)
end
function perf_binary_trees(io, N::Int=10)
    min_depth = 4
    max_depth = N
    stretch_depth = max_depth + 1
    pool = Node[]
    # create and check stretch tree
    let c = check(pool, make(pool, stretch_depth))
        @printf(io, "stretch tree of depth %i\t check: %i\n", stretch_depth, c)
    end

    long_lived_tree = make(pool, max_depth)

    loop_depths(io, min_depth, min_depth, max_depth)
    @printf(io, "long lived tree of depth %i\t check: %i\n", max_depth, check(pool, long_lived_tree))

end
 perf_binary_trees(stdout, 21)

@Orbots
Copy link
Author

Orbots commented Nov 20, 2018

Nice. I'm seeing about 9x better perf than the original.

@ChristianKurz
Copy link

@Orbots please don't use julia macros directly in a GitHub discussion. This will ping the GitHub-user with that name. Just use some code fences to prevent this e.g. @distributed.

@distributed
Copy link

distributed commented Dec 19, 2018 via email

@Orbots
Copy link
Author

Orbots commented Dec 19, 2018

Right, of course. That's funny. Github user 'time' must be regretting his choice of user names now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants