Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove treap functionality ?!? #9

Closed
wants to merge 118 commits into from

Conversation

fingolfin
Copy link
Collaborator

So today I thought about why exactly we need those treaps, and couldn't come up with a good answer... And indeed, it turns out that one can remove it, and things still work just fine.

The only justification for keeping the treap code I could think of so far was that we might need it for a hypothetical future optimization we discussed earlier, were one might "merge" the master point and bag content allocations into a single one, for bags which are known to be of fixed size (or at least not shrink). But that seems like a rather flimsy justification, given that it's unclear (a) whether we'll ever be able to implement that and (b) whether it'll bring a noticeable performance boost...

So, what am I missing?

@rbehrends
Copy link
Owner

The Treap functionality (or something equivalent) is absolutely essential for conservative stack scanning. The bigvals treap tracks all external allocation and allows us to identify potential pointers on the stack and translate pointers to the interiors of such bags – this can be the result of compiler optimizations – to their respective base addresses. Please do not remove this functionality under any circumstances without replacing it by an equivalent alternate functionality.

@fingolfin
Copy link
Collaborator Author

fingolfin commented May 27, 2018

Of course I am not removing it, which is why this is a PR, not a direct push!

But I seriously wonder about your claim that this is vital. First off, observe that this PR runs just fine through the full test suite.

Secondly, for GASMAN we also do not take reference into bag interiors on the stack into account -- only references to master pointers, which are never bigvals.

So, I guess the bit I don't understand is: why would things be different for the Julia GC?

@rbehrends
Copy link
Owner

The underlying problem is that optimizing compilers may discard references to the master pointer while the bag contents are still being used if the master pointer is not needed anymore. This may or may not happen, depends on the optimizations that are possible, and are rare in any event. It also does not matter if a separate reference to the bag exists elsewhere. But just because such situations are rare doesn't mean they are impossible.

The Julia GC integration has the additional problem that a related situation can occur with some regularity in NewBag(). Here, we allocate the bag contents first and then the master pointer (to avoid the need of having to deal with a write barrier). If the master pointer allocation triggers a GC, then the reference to the bag contents is the only thing keeping them alive. We could avoid that by doing additional work in NewBag(), such as nulling the master pointer and reintroducing a write barrier or storing the bag contents in a separate root, but I am reluctant to introduce additional overhead on a hot path. Even so, the other concerns would remain.

@fingolfin
Copy link
Collaborator Author

I don't understand you explanations about optimization, or at least not how these concerns would be equally applicable to GASMAN.

Specifically: if a GC can occur, then in GASMAN, we must use PTR_BAG/ADDR_OBJ on the master pointer anyway to recover a fresh pointer into the bag. Now of course it might be possible that the compiler "hides" the bag ref somehow, and we fail to see it, and then the master pointer is garbage collected. But then we'd run into a problem the moment we call ADDR_OBJ on it (which we'd do before any access to the bag content in valid GAP kernel code).

Also, the very same things could happen with GASMAN. Correct?

Re additional work in NewBag: Yes, that's what this patch does right now -- though it could be avoided using JL_GC_PUSH1 / JL_GC_POP (I didn't use them because I had some linker trouble with them and did not want to investigate). In some quick benchmarking on testinstall, this PR has an overall positiv effect, albeit a small one (something like 2-3% performance gain)

Also, please make sure to document all these concerns in the source code. Right now, there is no explanation at all about the purpose of the treap code.

@rbehrends
Copy link
Owner

rbehrends commented May 28, 2018

To provide an example:

static inline Bag fold(Bag bag) {
    Bag result = initialize();
    Int len = LEN_PLIST(bag);
    for (Int i = 1; i <= len; i++) {
        Bag elm = ELM_PLIST(bag, i);
        result = accumulate(result, elm);
    }
    return result;
}

An optimizing compiler can hoist the master pointer dereferencing that occurs as part of ELM_PLIST() out of the loop. As a result, the register/stack location containing the reference to the master pointer will not be live anymore at the beginning of the loop and can be reused by the compiler. If accumulate() now indirectly triggers a GC, the master pointer object may be garbage collected if is not referenced elsewhere anymore.

This is normally not a problem, because the address is still live in the caller. However, if the function is inlined and the bag is not used anymore (and thus also not live in the caller), the premature (from the point of a conservative GC) deallocation may happen anyway.

See this paper for some of the pitfalls surrounding conservative GC and optimizing compilers.

And yes, in principle the same thing could happen with Gasman, too.

@fingolfin
Copy link
Collaborator Author

I don't think your example is valid: the compiler can not hoist the master pointer dereferencing out of the loop, precisely because accumulate might invalidate it.

@rbehrends
Copy link
Owner

rbehrends commented May 28, 2018

It's not about the master pointer (i.e. the one-word object allocated in memory), but the reference in a register or on a stack to the master pointer. This is only kept locally and accessed (in a way not supported by the C standard) through conservative stack scanning; accumulate() cannot modify this reference, as it is kept only in a local variable that accumulate() does not have access to. I'll update the original comment to clarify this.

@fingolfin
Copy link
Collaborator Author

I am looking forward to that, as I still don't see how (a) such a compiler wouldn't wreck GASMAN completely, too, and (b) how the treap would then actually help in general (given that the references into the bag content also may be "hidden" and not be visible on either the stack or a register)

@ChrisJefferson
Copy link

So, I've thought about this a little in the past. I don't claim to be an expert, but I'll give you an answer :)

We don't have a problem with (a), because garbage collections can only happen as a result of explictly creating a new bag, and after the creation of a new bag code must deference all Master Pointers again, as a GC may have occurred. Therefore we can't "lose" acccess to Master Pointers. I believe @rbehrends is wrong here.

On the other hand, once your GC doesn't move, once you've derefrenced a Master Pointer you don't need to keep hold of it any more, and it's much easier to lose it -- I have written code like @rbehrends 's example above to prove to myself you can easily lose access to master pointers.

We could of course introduce some new function for holding onto references to a master pointer until the end of the function (basically, make them volatile), which I know the .NET compiler does (it introduces them itself during compiling).

@fingolfin
Copy link
Collaborator Author

fingolfin commented May 28, 2018

Of course if we started writing code which relies on a non-moving GC, or if GC suddenly happen in a separate thread, we'd have a problem. But then we'd have the same problem or worse with GASMAN. So, what Chris wrote agrees 100% with my thoughts.

Holding onto refs: that's precisely what JL_GC_PUSH/POP are for

@rbehrends
Copy link
Owner

Okay, I've given this some more thoughts:

  1. I think @ChrisJefferson is right in that the compiler will not assume that ADDR_OBJ(bag) and friends are loop invariants as it can't tell what happens during a garbage collection (not necessarily because the C standard prevents such optimizations, but because the necessary analysis is beyond the abilities of real compilers).
  2. However, I'm not quite willing to rely on this. Memory allocators have a habit of relying on undefined code (by necessity), and modern C compilers have a bad habit of treating undefined code as "cannot possibly happen". Combined with link-time optimization, I'm reluctant to get rid of this safeguard. We can control what happens in Gasman, not so much in third party GCs.
  3. We do know that we have the existing problem that ADDR_OBJ(bag)[i] = f() and ADDR_OBJ(bag)[f()] can blow up Gasman and that extends to other GCs. While we can be reasonably certain that the kernel is kept clean of these, it would be nice to have a safeguard against the eventuality of a package doing this and (by chance) not triggering Gasman, but blowing up with Julia.
  4. Similar, if we pass the result of ADDR_OBJ(bag) or friends to a function or use it inside a loop, we can run into related problems.
  5. If we want to (eventually) handle threads also, we do have the problem that a GC cannot only happen during NewBag() and ResizeBag(), but at any safepoint (unlike with the Boehm GC, GC in the Julia GC can only happen during explicit safepoints; however, these are simply memory accesses to protected memory that trigger a signal and look – intentionally – harmless to the compiler to not impede optimization).

So, I'm still reluctant to remove functionality that does not seem to incur any significant performance overhead (~90ms for a ca ~50s run of tst/testinstall.g); this can be reduced to ~20ms by introducing min and max pointer bounds for bigval allocations.

Additionally, the effect of adding JL_GC_PUSH1() and JL_GC_POP() to NewBag() does incur significant overhead: it goes from taking ~400ms to ~1.9s for the same run. This overhead is probably largely due to JL_GC_PUSH1() and JL_GC_POP() implicitly calling jl_get_ptls_states(), which is expensive on macOS and Windows. We can do better by temporarily sticking the bag address in a global root that is explicitly marked by GapRootScanner, which makes NewBag() only take a cumulative ~520ms, but is still more expensive.

In short, given the uncertain performance benefits (and possibly losses) compared to the functionality benefits and additional safety of having this feature around, I would be reluctant to remove it. Even if we were to disable it, I'd keep the code at a minimum in order to be able to eventually support threads.

@fingolfin fingolfin force-pushed the mh/alt-gc-no-treap branch from 89608f0 to 45612db Compare May 28, 2018 14:16
@fingolfin
Copy link
Collaborator Author

Note that this PR does not use JL_GC_PUSH1 and JL_GC_POP, and the overhead in NewBag in my measurements is negligible.

Anyway, as stated before, I am not necessarily interested in getting rid of treaps; but rather, I would like to see really good documentation for it, explaining why it's there, and why not, so that future developers (including us) will not be prevented from modifying this code due to lack of understanding.

I think this conversation shows there is a real need for that. And the HPC-GAP kernel shows how bad things can be when this kind of documentation is completely missing... So let's not repeat that mistake :-)

@rbehrends
Copy link
Owner

I can definitely add the requisite comments and will do so shortly. :)

@fingolfin
Copy link
Collaborator Author

I updated this PR to introduce a new #define DISABLE_BIGVAL_TRACKING, which can be used to disable use of treaps at compile time.

@rbehrends rbehrends force-pushed the alt-gc branch 2 times, most recently from 89545e7 to b2d0434 Compare July 9, 2018 12:39
@fingolfin fingolfin force-pushed the alt-gc branch 2 times, most recently from 5622c24 to b2312c4 Compare July 9, 2018 13:09
fingolfin and others added 27 commits January 31, 2019 07:42
It cannot return anymore, so we can delete code after calls to NargError.
Also add two more tests to increase coverage a bit
Add RequireInputStream, RequireOutputStream
Up to now, `CharacterTableIsoclinic` dealt only with character tables
of groups of the structure 2.G.2 or 4.G.2.
Now also the cases p.G.p, for odd primes p, are supported.
CompUnbComObjExpr was calling CompSetUseRNam but should not; this lead to the
value of random RNams being emitted to the generated C code.
This also fixes a bug where it called DEG_PERM2 on a T_PERM4. I do
not know if this caused user visible problems.
While on a purely technical level, there is no difference between a C
function implementing a GAP function vs. a filter vs. a property vs. an
attribute, it can nevertheless sometimes be very helpful to see at a glance
what a given function is used for.
FindPRec was used to find if an rnam is in a record, and return
it's location if present.

FindPRec had a couple of issues -- it was difficult to use,
and also had a bug (it claimed to always set 'pos', but did not
in some cases).

FindPRec is left in the code, as it is used in the semigroups package.
The replacement always sets 'pos', but sets it to 0 in the case the
rnam is not present.
* determine the number of handlers using ARRAY_SIZE, instead of using
  a magic constant
* use `IS_OPERATION` to recognize operations, don't hardcode checks
  for the function bag size
In particular, do not (ab)use NLOC_FUNC and FEXS_FUNC anymore; instead
use a slightly larger function bag.
These warnings caused tests of several packages to fail (e.g.
resclasses, rcwa, anupq, fwtree). We should first get rid of all uses of
these functions in packages before lowering the info level again.

Also fix some of these info messages which were missing a space.
This describes the role of this list much better.
This distinguishes errors for ASS_LIST and ASSS_LIST, and is consistent
with what we do for ELM_LIST vs. ELMS_LIST.
@fingolfin fingolfin closed this Feb 8, 2019
@fingolfin fingolfin deleted the mh/alt-gc-no-treap branch October 28, 2021 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants