Using GAP.jl makes julia slow (unusable?) #266

thofma · 2019-06-19T07:07:59Z

Pretty cool that GAP.jl is working! Unfortunately we were seeing severe performance regressions after doing using GAP (although the code did not use any GAP functionality). We boiled it down to the following example. This is with a fresh install of GAP.jl and GAPTypes.jl:

julia> using BenchmarkTools

julia> function g(n)
         z = [BigInt(1)]
         for i in 1:n
           z = [z[] * i]
         end
         return z
       end

julia> @benchmark g(200000)
BenchmarkTools.Trial: 
  memory estimate:  35.99 GiB
  allocs estimate:  800003
  --------------
  minimum time:     4.931 s (8.00% GC)
  median time:      5.022 s (7.10% GC)
  mean time:        5.022 s (7.10% GC)
  maximum time:     5.113 s (6.24% GC)
  --------------
  samples:          2
  evals/sample:     1

julia> using GAP
Adding path /home/adsad/.julia/packages/GAP/aMcJN/gap/.libs to DL_LOAD_PATH
 ┌───────┐   GAP 4.10dev-2113-gf7f36d7 of today
 │  GAP  │   https://www.gap-system.org
 └───────┘   Architecture: x86_64-pc-linux-gnu-julia64-kv6
 Configuration:  gmp 6.1.2, Julia GC, Julia 1.1.0
 Loading the library and packages ...
 Packages:   GAPDoc 1.6.2, PrimGrp 3.3.2, SmallGrp 1.3, TransGrp 2.0.4
 Try '??help' for help. See also '?copyright', '?cite' and '?authors'

julia> @benchmark g(200000)
BenchmarkTools.Trial: 
  memory estimate:  35.99 GiB
  allocs estimate:  800003
  --------------
  minimum time:     14.477 s (73.01% GC)
  median time:      14.477 s (73.01% GC)
  mean time:        14.477 s (73.01% GC)
  maximum time:     14.477 s (73.01% GC)
  --------------
  samples:          1
  evals/sample:     1

So a slowdown by a factor of 3x by just doing using GAP. I have other examples where, after using GAP, a function spends 99% of its time in the GC. I don't know if we can use GAP.jl in this form.

@CarloSircana @fieker

The text was updated successfully, but these errors were encountered:

mohamed-barakat · 2019-06-19T07:30:53Z

I can confirm the behavior with GAP.jl alone. This is a serious issue. Thanks for reporting.

fingolfin · 2019-06-19T07:33:29Z

@thofma thanks for the report. That is of course unacceptable!

@rbehrends can you look into this? We first need to establish the exact cause of this. Of course the immediate suspect is the stack scanning. Which makes me wonder if we can't implement one immediate workaround for the case were GAP is launched from Julia (the one that is relevant here): Namely, if we are not inside GAP code, we don't need to scan the code. Of course this needs to take nesting into account... Luckily we already have added some provisions for tracking whether GAP is "active" or not in GAP's libgap-api, we just are not yet that in GAP.jl, but I am sure @sebasguts and me can change that quickly.

Assuming stack scanning is the main problem, this should immediately help this particular case. Of course we'd still have a performance issue when GAP code is running; for that, we can also make use of the mentioned libgap-api code, to restrict the range of the stack we need to scan, though, which should also help there.

Of course that won't help if GAP is the main process and it starts Julia, but that is less important for OSCAR (though of course it'd be nice to speed that up, too)

fingolfin · 2019-06-19T07:51:17Z

Just to make it explicit, I am talking about GAP_EnterStack_() and GAP_LeaveStack_(). Though I am not sure whether we can use them from pure Julia, as we'd need the equivalent of __builtin_frame_address(0), I think. On the Julia c kernel level, there is jl_get_frame_addr, a macro, and also gc_get_stack_ptr; perhaps we could use the latter.

rbehrends · 2019-06-19T08:57:17Z

I can kind of reproduce it, though in my case I "only" have a ~33% overhead. This may be due to architecture-specific differences, though. The culprit is GapTaskScanner(), the function responsible for scanning task stacks (but not the actual main stack, which is where the work happens).

Normally, scanning task stacks should not be that much of a problem. For starters, due to generational GC, unused task stacks should not be scanned every single time (or Julia would have a problem with highly recursive functions on those, too).

Even so, I'm not sure why there's such an inordinate amount of time spent scanning these; we do scan the main stack each time without ill effects.

I need to dig further into that to find out the actual underlying cause. (Architectural differences may be involved as macOS and Linux use slightly different task stack implementations, which could account for the differences in overhead that we are seeing.)

rbehrends · 2019-06-19T10:09:54Z

The biggest issue right now seems to be that task stacks seem to be scanned every minor collection, regardless of whether they've been touched in the meantime, rather than just every major collection. This may be an issue for Julia with large task stacks, too, though I'll have to investigate that more closely.

fingolfin · 2019-06-19T10:32:31Z

Which makes me wonder: did we ever test what happens if GAP is called from a task? Is this even supposed to work right now? If so: we definitely should add test cases for this

rbehrends · 2019-06-19T10:36:34Z

GAP is currently being called from a task when loaded as a Julia module, as far as I can tell. And yes, it is supposed to work; we have specific logic in GapRootScanner() to deal with that case.

Which doesn't mean that some other things can't go wrong, but in principle that use case is accounted for.

rbehrends · 2019-06-19T12:15:52Z

I've committed a quick fix here. I'm not yet positive that it handles the generational issues cleanly (and may require more tweaking for that), so I'm not yet going to turn it into a PR, but it does fix the performance regression for me.

rbehrends · 2019-06-20T17:59:27Z

I've updated the fix (same branch as above) to be more robust. However, in its current version it is implemented in C++, and we need to think about whether that is okay for GAP (other parts of GAP do use C++ already, but I'm not sure what the constraints are) and whether we need more testing for it due to some non-trivial changes. However, for now it should work for anybody who cannot wait for a patch to arrive officially.

fingolfin · 2019-07-17T19:10:59Z

@thofma @mohamed-barakat The fix has now been merged into the GAP master branch. Would be great if you were able to verify that it fixes the regression for you. Then we can get it into GAP 4.11, which I hope we'll then be able to release in August.

fingolfin · 2019-07-17T19:27:28Z

With the fix in place, I am now getting this:

julia> using BenchmarkTools
[ Info: Precompiling BenchmarkTools [6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf]

julia> function g(n)
         z = [BigInt(1)]
         for i in 1:n
           z = [z[] * i]
         end
         return z
       end
g (generic function with 1 method)

julia> @benchmark g(200000)
BenchmarkTools.Trial:
  memory estimate:  35.99 GiB
  allocs estimate:  800003
  --------------
  minimum time:     4.900 s (7.79% GC)
  median time:      4.923 s (8.66% GC)
  mean time:        4.923 s (8.66% GC)
  maximum time:     4.945 s (9.52% GC)
  --------------
  samples:          2
  evals/sample:     1

julia> using GAP
Adding path /home/horn/Projekte/OSCAR/GAP.jl/gap/.libs to DL_LOAD_PATH
 ┌───────┐   GAP 4.dev of today
 │  GAP  │   https://www.gap-system.org
 └───────┘   Architecture: x86_64-pc-linux-gnu-julia64-kv7
 Configuration:  gmp 6.1.2, Julia GC, Julia 1.1.1, readline
 Loading the library and packages ...
 Packages:   GAPDoc 1.6.2, PrimGrp 3.3.2, SmallGrp 1.3, TransGrp 2.0.4
 Try '??help' for help. See also '?copyright', '?cite' and '?authors'

julia> @benchmark g(200000)
BenchmarkTools.Trial:
  memory estimate:  35.99 GiB
  allocs estimate:  800003
  --------------
  minimum time:     5.264 s (12.15% GC)
  median time:      5.264 s (12.15% GC)
  mean time:        5.264 s (12.15% GC)
  maximum time:     5.264 s (12.15% GC)
  --------------
  samples:          1
  evals/sample:     1

So there still is a slow down, but rather minor. It would still be good to know how e.g. those tests @thofma mentions (where GC takes 99% of the time) now fare.

rbehrends · 2019-07-17T20:28:28Z

Hmm, this is a bigger slowdown than I experienced when I last tested it (where it was difficult to even reproduce a slowdown). I'll have to look at why there may still be residual issues.

thofma · 2019-07-18T11:08:42Z

It is better. (I can't find the example with the 99% anymore).

mohamed-barakat · 2019-07-18T21:05:18Z

@thofma @mohamed-barakat The fix has now been merged into the GAP master branch.

Works for me with no measurable regression.

fingolfin · 2019-07-19T09:14:33Z

@mohamed-barakat so you are saying that you get identical benchmark results before and after using GAP? Interesting?

@rbehrends can you reproduce the remaining slowdown I reported? (I had it on two machines, both on my MacBook and also on a Ubuntu server)

rbehrends · 2019-07-19T11:41:56Z

I'm getting completely different results (though my version of Julia is a couple of months old). I'm wondering if there are different versions of GMP involved? Anyhow, here is what I'm getting:

   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.3.0-DEV.0
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |

julia> include("bigfun.jl")
g (generic function with 1 method)

julia> @benchmark g(200000)
BenchmarkTools.Trial: 
  memory estimate:  35.99 GiB
  allocs estimate:  800003
  --------------
  minimum time:     22.148 s (1.38% GC)
  median time:      22.148 s (1.38% GC)
  mean time:        22.148 s (1.38% GC)
  maximum time:     22.148 s (1.38% GC)
  --------------
  samples:          1
  evals/sample:     1

julia> using GAP
Adding path /Users/behrends/develop/gap-dev/master/.libs to DL_LOAD_PATH
 ┌───────┐   GAP 4.dev of today
 │  GAP  │   https://www.gap-system.org
 └───────┘   Architecture: x86_64-apple-darwin18.6.0-julia64-kv7
 Configuration:  gmp 6.1.2, Julia GC, Julia 1.3.0-DEV
 Loading the library and packages ...
 Packages:   GAPDoc 1.6.2, PrimGrp 3.3.2, SmallGrp 1.3, TransGrp 2.0.4
 Try '??help' for help. See also '?copyright', '?cite' and '?authors'

julia> @benchmark g(200000)
BenchmarkTools.Trial: 
  memory estimate:  35.99 GiB
  allocs estimate:  800003
  --------------
  minimum time:     22.151 s (1.95% GC)
  median time:      22.151 s (1.95% GC)
  mean time:        22.151 s (1.95% GC)
  maximum time:     22.151 s (1.95% GC)
  --------------
  samples:          1
  evals/sample:     1

This is a pretty fast machine, so I'm not sure why it would take so much longer per run. It may also mean that significant relative differences in GC performance are hidden because most of the time is spent doing calculations.

I'll finally add that even with generational GC, part of the remaining performance difference may simply be that we're dealing with a significantly bigger heap.

mohamed-barakat · 2019-07-19T11:44:23Z

@mohamed-barakat so you are saying that you get identical benchmark results before and after using GAP? Interesting?

Yes, but maybe my tests do not test the GC heavily, at least, I was unable to measure regression. I'll try to make these test part of the Oscar CI.

rbehrends · 2019-07-19T12:02:05Z

After looking at it in a profiler, I primarily see a bit more time being spent in sweep_pool_pagetable0, which is linear in the order of pool pages in use. This would indicate that the performance difference is indeed largely caused by the larger heap.

fingolfin · 2019-09-09T13:04:19Z

@rbehrends can this issue be closed now?

rbehrends · 2019-09-09T14:02:37Z

@fingolfin Yes, I believe we are good.

fingolfin assigned fingolfin and rbehrends Jun 19, 2019

fingolfin added the kind: bug Something isn't working label Jun 19, 2019

rbehrends mentioned this issue Jun 21, 2019

Improve performance of the Julia GC integration gap-system/gap#3516

Merged

rbehrends closed this as completed Sep 9, 2019

fingolfin mentioned this issue Jul 12, 2020

Reduce the overhead for GAP symbol lookup via GAP.Globals.NAME #485

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using GAP.jl makes julia slow (unusable?) #266

Using GAP.jl makes julia slow (unusable?) #266

thofma commented Jun 19, 2019 •

edited by fingolfin

Loading

mohamed-barakat commented Jun 19, 2019

fingolfin commented Jun 19, 2019

fingolfin commented Jun 19, 2019

rbehrends commented Jun 19, 2019

rbehrends commented Jun 19, 2019

fingolfin commented Jun 19, 2019

rbehrends commented Jun 19, 2019 •

edited

Loading

rbehrends commented Jun 19, 2019

rbehrends commented Jun 20, 2019

fingolfin commented Jul 17, 2019

fingolfin commented Jul 17, 2019

rbehrends commented Jul 17, 2019

thofma commented Jul 18, 2019

mohamed-barakat commented Jul 18, 2019

fingolfin commented Jul 19, 2019

rbehrends commented Jul 19, 2019

mohamed-barakat commented Jul 19, 2019

rbehrends commented Jul 19, 2019

fingolfin commented Sep 9, 2019

rbehrends commented Sep 9, 2019

Using GAP.jl makes julia slow (unusable?) #266

Using GAP.jl makes julia slow (unusable?) #266

Comments

thofma commented Jun 19, 2019 • edited by fingolfin Loading

mohamed-barakat commented Jun 19, 2019

fingolfin commented Jun 19, 2019

fingolfin commented Jun 19, 2019

rbehrends commented Jun 19, 2019

rbehrends commented Jun 19, 2019

fingolfin commented Jun 19, 2019

rbehrends commented Jun 19, 2019 • edited Loading

rbehrends commented Jun 19, 2019

rbehrends commented Jun 20, 2019

fingolfin commented Jul 17, 2019

fingolfin commented Jul 17, 2019

rbehrends commented Jul 17, 2019

thofma commented Jul 18, 2019

mohamed-barakat commented Jul 18, 2019

fingolfin commented Jul 19, 2019

rbehrends commented Jul 19, 2019

mohamed-barakat commented Jul 19, 2019

rbehrends commented Jul 19, 2019

fingolfin commented Sep 9, 2019

rbehrends commented Sep 9, 2019

thofma commented Jun 19, 2019 •

edited by fingolfin

Loading

rbehrends commented Jun 19, 2019 •

edited

Loading