-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libgap API: Problems with GASMAN and StackBottom #3089
Comments
On Julia they run GAP with Julia's GC replacing GAP's GC. |
While it is true that we are using the Julia GC for GAPJulia, this Beyond that: We are aware of this problem, and discussed it before. And we arrived essentially at the two possible solutions you also describe; and I think both should be implemented.:
One might now suggest to insert such enter/leave calls into each GAP API, but that would add a performance burden on clients which can happily use solution 1 (or a variant thereof, as GAPJulia does, in effect). Also, if the calling code has references to GAP objects on the stack, these would now be invisible (unless one adds additional dirty hacks, like: "if the previous |
@dimpase That makes sense, though then doesn't that mean having to have a GAP compiled specifically for Julia? |
@fingolfin Thank you for at least confirming that my analysis seems right. I agree with you about offering both solutions 1 and 2. I've already been working on a patch with which I've been experimenting with something like this. I added a Also agree about I do believe And yes, I'm also trying to wedge better error handling into this, but right now I'm mostly concerned about the stack/GC issue. |
I'm not sure if it's better than any of the other approaches, but one could deduce a valid value for stackbottom but simply scanning until you get a sigsegv. That might result in scanning too large an area, but the worst that does is a small performance hit in GC and a small increase in objects incorrectly kept alive. Regarding |
that's correct. Also, AFAIK there is currently no way to use Julia's GC outside of Julia, which is monolitic in design and does not hav GC as a separately useable module. And another possible scenario would be GAP built with Boehm GC rather than GASMAN. |
@stevelinton That's a technique similar to one of the things we do in GAPJulia -- but catching |
But to be clear: please do not worry about Boehm or Julia GC here. Perhaps on the very long run, this might become relevant, but let's focus on providing a solution that works with GASMAN first |
…ck local GAP objects in code which embeds libgap There are two parts to this: First, the outer-most GAP_Enter() must set the StackBottom variable for GASMAN, without which objects tracked by GASMAN that are allocated on the stack are properly tracked (see gap-system#3089). Second, the outer-most GAP_Enter() call should set a jump point for longjmps to STATE(ReadJmpError). Code within the GAP kernel may reset this, but we should set it here in case any unexpected errors occur within the GAP kernel that are not already handled appropriately by a TRY_IF_NO_ERROR. For the first issue, we add GAP_EnterStack() and GAP_LeaveStack() macros which implement *just* the StackBottom handling without any other error handling. We also add a function to gasman.c called _MarkStackBottomBags which just updates the StackBottom variable. Then the macro called MarkStackBottomBags (same name without underscore) can be used within a function to set StackBottom to somewhere at or near the beginning of that function's stack frame. This uses GCC's __builtin_frame_address, but supported is probably needed for other platforms that don't have this. The state variable STATE(EnterStackCount) is used to track recursive calls into GAP_EnterStack(). We only want to set StackBottom on the outer-most call. The count is decremented on GAP_LeaveStack(). Some functions are provided for manipulating the counter from the API without directly exposing the GAP state, but I'm not sure if this is necessary or desirable, especially since it means EnterStackCount isn't updated atomically. My hope was to avoid exposing too many GAP internals, but it may be necessary in order to implement these as macros. For setting the STATE(ReadJmpError) jump buffer we provide a macro called GAP_Error_Setjmp() which is fairly straightforward, except that it needs to be written in such a way that it can be used in concert correctly with GAP_EnterStack(). In particular, if returning to the site of a GAP_Error_Setjmp() call we do not want to accidentally re-increment the EnterStackCount. Finally, the higher-level GAP_Enter() and GAP_Leave() macros are provided: The latter is just an alias for GAP_LeaveStack(), but the former carefully combines *both* GAP_Error_Setjmp() and GAP_EnterStack(). Although called like a function, the GAP_Enter() macro expands to a compound statement (necessary for both GAP_EnterStack() and GAP_Error_Setjmp() to work properly). The order of expansion is also deliberate so that this can be used like: jmp_retval = GAP_Enter(); so that the return value of the setjmp call can be stored in a variable while using GAP_Enter(), and can be checked to see whether an error occurred. However, this requires some care to ensure that the following GAP_EnterStack() doesn't increment the EnterStackCount following a return to this point via a longjmp.
See #3096 for my (thus far quite successful, at least for practical purposes) prototype of a solution. |
…bjects in code which embeds libgap There are two parts to this: First, the outer-most GAP_Enter() must set the StackBottom variable for GASMAN, without which objects tracked by GASMAN that are allocated on the stack are properly tracked (see gap-system#3089). Second, the outer-most GAP_Enter() call should set a jump point for longjmps to STATE(ReadJmpError). Code within the GAP kernel may reset this, but we should set it here in case any unexpected errors occur within the GAP kernel that are not already handled appropriately by a TRY_IF_NO_ERROR. Or, more precisely, since TRY_IF_NO_ERROR does not currently *clear* the jump buffer it sets even when leaving the block without any errors found, a subsequent error that is not within a TRY_IF_NO_ERROR block can still result in a jump to the last TRY_IF_NO_ERROR block which was entered, which can be quite arbitrary (e.g. inside IntializeGap). For the first issue, we add GAP_EnterStack() and GAP_LeaveStack() macros which implement *just* the StackBottom handling without any other error handling. We also add a function to gasman.c called MarkStackBottomBags which just updates the StackBottom variable. Then the GAP_EnterStack() macro can be used within a function to set StackBottom to somewhere at or near the beginning of that function's stack frame. This uses GCC's __builtin_frame_address, but supported is probably needed for other platforms that don't have this. The global variable EnterStackCount in libgap-api.c is used to track recursive calls into GAP_EnterStack(). We only want to set StackBottom on the outer-most call. The count is decremented on GAP_LeaveStack(). For setting the STATE(ReadJmpError) jump buffer we provide a macro called GAP_Error_Setjmp() which is fairly straightforward, except that it needs to be written in such a way that it can be used in concert correctly with GAP_EnterStack(). In particular, if returning to the site of a GAP_Error_Setjmp() call we do not want to accidentally re-increment the EnterStackCount. Finally, the higher-level GAP_Enter() and GAP_Leave() macros are provided: The latter is just an alias for GAP_LeaveStack(), but the former carefully combines *both* GAP_Error_Setjmp() and GAP_EnterStack(). Although called like a function, the GAP_Enter() macro expands to a compound statement (necessary for both GAP_EnterStack() and GAP_Error_Setjmp() to work properly). The order of expansion is also deliberate so that this can be used like: jmp_retval = GAP_Enter(); so that the return value of the setjmp call can be stored in a variable while using GAP_Enter(), and can be checked to see whether an error occurred. However, this requires some care to ensure that the following GAP_EnterStack() doesn't increment the EnterStackCount following a return to this point via a longjmp. Also updates the libgap API tests to demonstrate usage of GAP_Enter/Leave(). Note: The whole EnterStackCount manipulation is really really only in service to GASMAN and can be disabled entirely when compiled for a different GC, but it still has some use as a sanity check on GAP_Enter/Leave() so for not it is mostly but not entirely a no-op on GAP built with other GCs.
…bjects in code which embeds libgap There are two parts to this: First, the outer-most GAP_Enter() must set the StackBottom variable for GASMAN, without which objects tracked by GASMAN that are allocated on the stack are properly tracked (see gap-system#3089). Second, the outer-most GAP_Enter() call should set a jump point for longjmps to STATE(ReadJmpError). Code within the GAP kernel may reset this, but we should set it here in case any unexpected errors occur within the GAP kernel that are not already handled appropriately by a TRY_IF_NO_ERROR. Or, more precisely, since TRY_IF_NO_ERROR does not currently *clear* the jump buffer it sets even when leaving the block without any errors found, a subsequent error that is not within a TRY_IF_NO_ERROR block can still result in a jump to the last TRY_IF_NO_ERROR block which was entered, which can be quite arbitrary (e.g. inside IntializeGap). For the first issue, we add GAP_EnterStack() and GAP_LeaveStack() macros which implement *just* the StackBottom handling without any other error handling. We also add a function to gasman.c called SetStackBottomBags which just updates the StackBottom variable. Then the GAP_EnterStack() macro can be used within a function to set StackBottom to somewhere at or near the beginning of that function's stack frame. This uses GCC's __builtin_frame_address, but supported is probably needed for other platforms that don't have this. The global variable EnterStackCount in libgap-api.c is used to track recursive calls into GAP_EnterStack(). We only want to set StackBottom on the outer-most call. The count is decremented on GAP_LeaveStack(). For setting the STATE(ReadJmpError) jump buffer we provide a macro called GAP_Error_Setjmp() which is fairly straightforward, except that it needs to be written in such a way that it can be used in concert correctly with GAP_EnterStack(). In particular, if returning to the site of a GAP_Error_Setjmp() call we do not want to accidentally re-increment the EnterStackCount. Finally, the higher-level GAP_Enter() and GAP_Leave() macros are provided: The latter is just an alias for GAP_LeaveStack(), but the former carefully combines *both* GAP_Error_Setjmp() and GAP_EnterStack(). Although called like a function, the GAP_Enter() macro expands to a compound statement (necessary for both GAP_EnterStack() and GAP_Error_Setjmp() to work properly). The order of expansion is also deliberate so that this can be used like: jmp_retval = GAP_Enter(); so that the return value of the setjmp call can be stored in a variable while using GAP_Enter(), and can be checked to see whether an error occurred. However, this requires some care to ensure that the following GAP_EnterStack() doesn't increment the EnterStackCount following a return to this point via a longjmp. Also updates the libgap API tests to demonstrate usage of GAP_Enter/Leave(). Note: The whole EnterStackCount manipulation is really really only in service to GASMAN and can be disabled entirely when compiled for a different GC, but it still has some use as a sanity check on GAP_Enter/Leave() so for not it is mostly but not entirely a no-op on GAP built with other GCs.
…bjects in code which embeds libgap There are two parts to this: First, the outer-most GAP_Enter() must set the StackBottom variable for GASMAN, without which objects tracked by GASMAN that are allocated on the stack are properly tracked (see #3089). Second, the outer-most GAP_Enter() call should set a jump point for longjmps to STATE(ReadJmpError). Code within the GAP kernel may reset this, but we should set it here in case any unexpected errors occur within the GAP kernel that are not already handled appropriately by a TRY_IF_NO_ERROR. Or, more precisely, since TRY_IF_NO_ERROR does not currently *clear* the jump buffer it sets even when leaving the block without any errors found, a subsequent error that is not within a TRY_IF_NO_ERROR block can still result in a jump to the last TRY_IF_NO_ERROR block which was entered, which can be quite arbitrary (e.g. inside IntializeGap). For the first issue, we add GAP_EnterStack() and GAP_LeaveStack() macros which implement *just* the StackBottom handling without any other error handling. We also add a function to gasman.c called SetStackBottomBags which just updates the StackBottom variable. Then the GAP_EnterStack() macro can be used within a function to set StackBottom to somewhere at or near the beginning of that function's stack frame. This uses GCC's __builtin_frame_address, but supported is probably needed for other platforms that don't have this. The global variable EnterStackCount in libgap-api.c is used to track recursive calls into GAP_EnterStack(). We only want to set StackBottom on the outer-most call. The count is decremented on GAP_LeaveStack(). For setting the STATE(ReadJmpError) jump buffer we provide a macro called GAP_Error_Setjmp() which is fairly straightforward, except that it needs to be written in such a way that it can be used in concert correctly with GAP_EnterStack(). In particular, if returning to the site of a GAP_Error_Setjmp() call we do not want to accidentally re-increment the EnterStackCount. Finally, the higher-level GAP_Enter() and GAP_Leave() macros are provided: The latter is just an alias for GAP_LeaveStack(), but the former carefully combines *both* GAP_Error_Setjmp() and GAP_EnterStack(). Although called like a function, the GAP_Enter() macro expands to a compound statement (necessary for both GAP_EnterStack() and GAP_Error_Setjmp() to work properly). The order of expansion is also deliberate so that this can be used like: jmp_retval = GAP_Enter(); so that the return value of the setjmp call can be stored in a variable while using GAP_Enter(), and can be checked to see whether an error occurred. However, this requires some care to ensure that the following GAP_EnterStack() doesn't increment the EnterStackCount following a return to this point via a longjmp. Also updates the libgap API tests to demonstrate usage of GAP_Enter/Leave(). Note: The whole EnterStackCount manipulation is really really only in service to GASMAN and can be disabled entirely when compiled for a different GC, but it still has some use as a sanity check on GAP_Enter/Leave() so for not it is mostly but not entirely a no-op on GAP built with other GCs.
…bjects in code which embeds libgap There are two parts to this: First, the outer-most GAP_Enter() must set the StackBottom variable for GASMAN, without which objects tracked by GASMAN that are allocated on the stack are properly tracked (see gap-system#3089). Second, the outer-most GAP_Enter() call should set a jump point for longjmps to STATE(ReadJmpError). Code within the GAP kernel may reset this, but we should set it here in case any unexpected errors occur within the GAP kernel that are not already handled appropriately by a TRY_IF_NO_ERROR. Or, more precisely, since TRY_IF_NO_ERROR does not currently *clear* the jump buffer it sets even when leaving the block without any errors found, a subsequent error that is not within a TRY_IF_NO_ERROR block can still result in a jump to the last TRY_IF_NO_ERROR block which was entered, which can be quite arbitrary (e.g. inside IntializeGap). For the first issue, we add GAP_EnterStack() and GAP_LeaveStack() macros which implement *just* the StackBottom handling without any other error handling. We also add a function to gasman.c called SetStackBottomBags which just updates the StackBottom variable. Then the GAP_EnterStack() macro can be used within a function to set StackBottom to somewhere at or near the beginning of that function's stack frame. This uses GCC's __builtin_frame_address, but supported is probably needed for other platforms that don't have this. The global variable EnterStackCount in libgap-api.c is used to track recursive calls into GAP_EnterStack(). We only want to set StackBottom on the outer-most call. The count is decremented on GAP_LeaveStack(). For setting the STATE(ReadJmpError) jump buffer we provide a macro called GAP_Error_Setjmp() which is fairly straightforward, except that it needs to be written in such a way that it can be used in concert correctly with GAP_EnterStack(). In particular, if returning to the site of a GAP_Error_Setjmp() call we do not want to accidentally re-increment the EnterStackCount. Finally, the higher-level GAP_Enter() and GAP_Leave() macros are provided: The latter is just an alias for GAP_LeaveStack(), but the former carefully combines *both* GAP_Error_Setjmp() and GAP_EnterStack(). Although called like a function, the GAP_Enter() macro expands to a compound statement (necessary for both GAP_EnterStack() and GAP_Error_Setjmp() to work properly). The order of expansion is also deliberate so that this can be used like: jmp_retval = GAP_Enter(); so that the return value of the setjmp call can be stored in a variable while using GAP_Enter(), and can be checked to see whether an error occurred. However, this requires some care to ensure that the following GAP_EnterStack() doesn't increment the EnterStackCount following a return to this point via a longjmp. Also updates the libgap API tests to demonstrate usage of GAP_Enter/Leave(). Note: The whole EnterStackCount manipulation is really really only in service to GASMAN and can be disabled entirely when compiled for a different GC, but it still has some use as a sanity check on GAP_Enter/Leave() so for not it is mostly but not entirely a no-op on GAP built with other GCs.
…bjects in code which embeds libgap There are two parts to this: First, the outer-most GAP_Enter() must set the StackBottom variable for GASMAN, without which objects tracked by GASMAN that are allocated on the stack are properly tracked (see #3089). Second, the outer-most GAP_Enter() call should set a jump point for longjmps to STATE(ReadJmpError). Code within the GAP kernel may reset this, but we should set it here in case any unexpected errors occur within the GAP kernel that are not already handled appropriately by a TRY_IF_NO_ERROR. Or, more precisely, since TRY_IF_NO_ERROR does not currently *clear* the jump buffer it sets even when leaving the block without any errors found, a subsequent error that is not within a TRY_IF_NO_ERROR block can still result in a jump to the last TRY_IF_NO_ERROR block which was entered, which can be quite arbitrary (e.g. inside IntializeGap). For the first issue, we add GAP_EnterStack() and GAP_LeaveStack() macros which implement *just* the StackBottom handling without any other error handling. We also add a function to gasman.c called SetStackBottomBags which just updates the StackBottom variable. Then the GAP_EnterStack() macro can be used within a function to set StackBottom to somewhere at or near the beginning of that function's stack frame. This uses GCC's __builtin_frame_address, but supported is probably needed for other platforms that don't have this. The global variable EnterStackCount in libgap-api.c is used to track recursive calls into GAP_EnterStack(). We only want to set StackBottom on the outer-most call. The count is decremented on GAP_LeaveStack(). For setting the STATE(ReadJmpError) jump buffer we provide a macro called GAP_Error_Setjmp() which is fairly straightforward, except that it needs to be written in such a way that it can be used in concert correctly with GAP_EnterStack(). In particular, if returning to the site of a GAP_Error_Setjmp() call we do not want to accidentally re-increment the EnterStackCount. Finally, the higher-level GAP_Enter() and GAP_Leave() macros are provided: The latter is just an alias for GAP_LeaveStack(), but the former carefully combines *both* GAP_Error_Setjmp() and GAP_EnterStack(). Although called like a function, the GAP_Enter() macro expands to a compound statement (necessary for both GAP_EnterStack() and GAP_Error_Setjmp() to work properly). The order of expansion is also deliberate so that this can be used like: jmp_retval = GAP_Enter(); so that the return value of the setjmp call can be stored in a variable while using GAP_Enter(), and can be checked to see whether an error occurred. However, this requires some care to ensure that the following GAP_EnterStack() doesn't increment the EnterStackCount following a return to this point via a longjmp. Also updates the libgap API tests to demonstrate usage of GAP_Enter/Leave(). Note: The whole EnterStackCount manipulation is really really only in service to GASMAN and can be disabled entirely when compiled for a different GC, but it still has some use as a sanity check on GAP_Enter/Leave() so for not it is mostly but not entirely a no-op on GAP built with other GCs.
…bjects in code which embeds libgap There are two parts to this: First, the outer-most GAP_Enter() must set the StackBottom variable for GASMAN, without which objects tracked by GASMAN that are allocated on the stack are properly tracked (see gap-system#3089). Second, the outer-most GAP_Enter() call should set a jump point for longjmps to STATE(ReadJmpError). Code within the GAP kernel may reset this, but we should set it here in case any unexpected errors occur within the GAP kernel that are not already handled appropriately by a TRY_IF_NO_ERROR. Or, more precisely, since TRY_IF_NO_ERROR does not currently *clear* the jump buffer it sets even when leaving the block without any errors found, a subsequent error that is not within a TRY_IF_NO_ERROR block can still result in a jump to the last TRY_IF_NO_ERROR block which was entered, which can be quite arbitrary (e.g. inside IntializeGap). For the first issue, we add GAP_EnterStack() and GAP_LeaveStack() macros which implement *just* the StackBottom handling without any other error handling. We also add a function to gasman.c called SetStackBottomBags which just updates the StackBottom variable. Then the GAP_EnterStack() macro can be used within a function to set StackBottom to somewhere at or near the beginning of that function's stack frame. This uses GCC's __builtin_frame_address, but supported is probably needed for other platforms that don't have this. The global variable EnterStackCount in libgap-api.c is used to track recursive calls into GAP_EnterStack(). We only want to set StackBottom on the outer-most call. The count is decremented on GAP_LeaveStack(). For setting the STATE(ReadJmpError) jump buffer we provide a macro called GAP_Error_Setjmp() which is fairly straightforward, except that it needs to be written in such a way that it can be used in concert correctly with GAP_EnterStack(). In particular, if returning to the site of a GAP_Error_Setjmp() call we do not want to accidentally re-increment the EnterStackCount. Finally, the higher-level GAP_Enter() and GAP_Leave() macros are provided: The latter is just an alias for GAP_LeaveStack(), but the former carefully combines *both* GAP_Error_Setjmp() and GAP_EnterStack(). Although called like a function, the GAP_Enter() macro expands to a compound statement (necessary for both GAP_EnterStack() and GAP_Error_Setjmp() to work properly). The order of expansion is also deliberate so that this can be used like: jmp_retval = GAP_Enter(); so that the return value of the setjmp call can be stored in a variable while using GAP_Enter(), and can be checked to see whether an error occurred. However, this requires some care to ensure that the following GAP_EnterStack() doesn't increment the EnterStackCount following a return to this point via a longjmp. Also updates the libgap API tests to demonstrate usage of GAP_Enter/Leave(). Note: The whole EnterStackCount manipulation is really really only in service to GASMAN and can be disabled entirely when compiled for a different GC, but it still has some use as a sanity check on GAP_Enter/Leave() so for not it is mostly but not entirely a no-op on GAP built with other GCs.
I have brought this issue up previously on the Sage Trac, but this is a generic problem with embedding libgap in larger applications, and it is a multi-faceted problem which I thought should be clearly summarized somewhere.
The core of the problem is that the GAP kernel, as designed, assumes that all code which uses GAP objects (tracked through the GASMAN GC), is going to run in a context with either a call to
InitializeGap
as a parent stack frame, or at least have a common parent with the call toInitializeGap
. This is illustrated in the fact thatInitializeGap
uses a pointer to some main function'sargc
variable to set theStackBottom
variable in the GC (this itself is a problem which I'll list later).This assumption for setting
StackBottom
is safe for GAP itself, but not for embedding in another application with libgap. Consider the following scenario, whereGAP_Initialize
is called by some program embedding GAP, where GAP initialization is delayed until necessary, and possibly not called until deep into some other function call (forgive my ASCII art):Now imagine that later in the application there is some other function called which uses GAP, but less deeply nested than the one that happened to call
GAP_Initialize
; it also happens to result in some nested GAP function calls:Now we have (on x86 architectures)
StackTop > StackBottom
.GenStackFuncBags
does handle the case wheretop > bottom
, but that "seems" to be more in order to handle different architectures where the stack grows in different directions, than some deliberate attempt to handle this case. Because now there are stack frames possibly containing young bags that haven't been swept yet which are above the current "top" of the stack.So in the illustration above,
GenStackFuncBags
will sweep from#43
, up through a bunch of garbage, dead stack frames, until#23
, and will stop there without continuing upward to#22
and so on which may still contain some unmarked bags in local variables which will then be marked dead despite being still very much in use.This seems to be a problem to me, but is there something obvious I'm missing in this analysis.
@sebasguts , is this problem handled in any way by GAPJulia?
As far as I can tell, if the above analysis is a real problem, then that places the onus on any application which embeds GAP to ensure that GAP is initialized very early on in the application so that anything else that uses GAP and creates GAP objects happens in a "deeper" stack frame.
For a custom application this is less of a problem as one has more control over when
GAP_Initialize
is actually called (though it is still a problem if one wants to avoid the overhead ofGAP_Initialize
by calling it lazily only when needed).For embedding in library for another language like Python, however, it's a bigger problem. Because in writing a Python module that wraps GAP we cannot easily guarantee when that module will be imported relative to execution of the rest of the application (much less relative to Python's
main()
function).There are multiple possible solutions to this problem and I'm not sure which is the best one. If nothing else it ought to be possible, when calling
GAP_Initialize
, to provide a custom value for what address should be set toStackBottom
, rather than just assuming it's whereGAP_Initialize
is called from. Additionally I don't think it should double as a pointer to theargc
value passed toInitializeGap
, as that then places the undue burden on the client code of ensuring that whatever we pass asStackBottom
points to theargc
argument we want to pass to GAP.Even then, however, it's not always obvious from the perspective of client code where an appropriate address is to set
StackBottom
to--this is a rather low-level concern for most users of the code. ISTM the best option is to provide someGAP_Enter
andGAP_Exit
gates that should surround any client code that calls libgap functions, and that possibly has GAP objects on the stack.GAP_Enter
would mark the location ofStackBottom
from where it is called (it would likely be a macro), and would include protection for recursion (hence the need for aGAP_Exit
as well).I'm not sure that there's really a better way, unless there were some way this could be managed automatically by the API-level functions, but my experience with GAP's GC is not at a level where I can see a better way.
The text was updated successfully, but these errors were encountered: