Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimerOutputs + some speedups #27

Merged
merged 1 commit into from
May 2, 2021
Merged

Conversation

cscherrer
Copy link
Contributor

It looks like there are some places things could be a little faster. This PR adds some @timeit_debug calls from TimerOutputs. I've used the results from this to start on speeding things up. Here are the results after my updates:

 ───────────────────────────────────────────────────────────────────────────────
                                        Time                   Allocations      
                                ──────────────────────   ───────────────────────
        Tot / % measured:            3.71s / 17.0%            473MiB / 100%     

 Section                ncalls     time   %tot     avg     alloc   %tot      avg
 ───────────────────────────────────────────────────────────────────────────────
 Main while loop             1    633ms   100%   633ms    472MiB  100%    472MiB
   Run chains                3    444ms  70.2%   148ms    163MiB  34.4%  54.2MiB
   Compute covariance        3    161ms  25.4%  53.5ms    281MiB  59.6%  93.8MiB
     Update Σ_j              3    161ms  25.4%  53.5ms    281MiB  59.6%  93.8MiB
     Symmetrize Σ_j          3   64.4μs  0.01%  21.5μs      720B  0.00%     240B
     Inititialize Σ_j        3   6.63μs  0.00%  2.21μs     0.00B  0.00%    0.00B
   Initialize ins            3   2.79ms  0.44%   929μs    913KiB  0.19%   304KiB
   Update ins                3   2.24ms  0.35%   747μs   0.96MiB  0.20%   329KiB
   Inner while loop          3   1.34ms  0.21%   446μs   1.10MiB  0.23%   375KiB
   Compute randIndex         3    222μs  0.04%  74.1μs    236KiB  0.05%  78.8KiB
   Nominal weights           3   82.6μs  0.01%  27.5μs   47.9KiB  0.01%  16.0KiB
   Weighted mean             3   41.8μs  0.01%  13.9μs   47.4KiB  0.01%  15.8KiB
   Normalised weights        3   36.4μs  0.01%  12.1μs   47.5KiB  0.01%  15.8KiB
   Log evidence              3   17.7μs  0.00%  5.88μs      240B  0.00%    80.0B
 ───────────────────────────────────────────────────────────────────────────────

By far the most expensive operations are the "Run chains" and "Compute covariance" sections. I made the following changes:

  1. In metropolis_hastings_simple, the original implementation was traversing across rows. In Julia, memory is organized columnwise, so it's much faster to iterate column-first. Rather than reorganize the code around this, I just swapped the indices in the function and returned the transpose.
  2. To avoid allocations, I initialized Σ_j outside the loop. Then the loop can just update in-place. There's still a lot of overhead in this section:
 @timeit_debug to "Update Σ_j" for l = 1:Nsamples
     Σ_j .+= beta2 .* wn_j[l] .* (θ_j[l,:]' .- Th_wm)' * (θ_j[l,:]' .- Th_wm)
end

I'm sure this can still be reduced a lot. But rather than change too much at once, I wanted to see if you had any feedback on updatges so far.

@cscherrer
Copy link
Contributor Author

Oh, I should say... The timers above are for the tmcmc call in this section of the tests:

    @testset "1D" begin
        Random.seed!(123456)
        lb  = -15
        ub  = 15

        fT(x) = logpdf(Uniform(lb, ub), x[1])
        sample_fT(Nsamples) = rand(Uniform(lb, ub), Nsamples, 1)

        log_fD_T(x) = log(pdf(Normal(0, 1), x[1]) + pdf(Normal(5, 0.2), x[1]))

        Nsamples = 2000
        samps, acc = tmcmc(log_fD_T, fT, sample_fT, Nsamples)
        @test mean(samps)  2.5 atol = 0.2
        @test std(samps)  2.6 atol = 0.2
    end

@codecov-commenter
Copy link

codecov-commenter commented Apr 30, 2021

Codecov Report

Merging #27 (41b1587) into main (f6f181c) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main       #27   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            3         3           
  Lines           89        89           
=========================================
  Hits            89        89           
Impacted Files Coverage Δ
src/mcmc.jl 100.00% <100.00%> (ø)
src/tmcmc.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f6f181c...41b1587. Read the comment docs.

@AnderGray
Copy link
Owner

Great stuff Chad! Thanks for this PR. I had no idea Julia prefers columns.

I think they should be merged as is, what do you think @FriesischScott

Some of the changes from PR26 should also improve allocation times.

@FriesischScott
Copy link
Collaborator

I agree. Good stuff.

@AnderGray AnderGray merged commit b8a669c into AnderGray:main May 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants