-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix missing increment of the instance count on every call to startup() #3100
base: master
Are you sure you want to change the base?
Conversation
As for me it looks ok, but Max will be back in the second half of January. |
Still has bugs with existing code, as per the issue discussion |
I think this modified solution is reasonable to handle the remaining issues. |
Wait, there's one thing I don't understand. Why do you need to prevent the call to cleanup()? The call to cleanup() should be able to be done multiple times, even if at least one of the series has done the real cleanup. The only rule was that there should be done as many (meaning, the minimum) cleanup calls as there was startup calls, but extra cleanup() calls, which actually do nothing, should be allowed. Have you found a problem with extra cleanups? |
This doesn't disallow extra cleanup in general, that allowance remains in place. The change here though is that it avoids an unpaired cleanup from internal code. Doing a cleanup internally when a startup wasn't done internally can hide bugs in the API users code. And since the cleanup done in |
The cleanup call is prevented if this wasn't started internally. I don't understand the reason of this blocking. Starting it internally should simply call exactly the same startup(), with exception that it should check if it isn't initialized already, to ensure that an internal startup is really done only once. So, this is called when creating a socket and it should ensure that this can be called only once, and only once increase the number of instances. Just the call from |
It depends what version of the code you are looking at. What you are stating was correct for the old code, before the original bug was introduced. However multiple changes occurred with that bug commit. The current master fails to increment on any but the first startup call, which results in incorrect cleanups due to the counting being incorrect. If we fix it so every startup() increments the count, then the counting increments too much due to internal startups, which occur every time a socket is created now. This is new behavior because the code used to avoid the internal startup() call with a This PR handles both of those new issues. There are certainly multiple ways to solve this. My commit is one way. Feel free to solve it another if you wish. |
All I was asking was the reasoning of introducing the new field for marking the implicit initialization and performing the cleanup only under this condition. And how well it handles a situation, when after creating a socket without having called Things that I think should be taken into account are:
This version you provided, for example, prevents the cleanup action from being called in a situation when a user called |
I added the new field because making a choice on implicit startup/cleanups based on the state of the GC thread bool seems error prone. If the logic for the GC thread changes in the future, it may introduce bugs. The GC management is a sub-portion of the startup/cleanup procedure. Agreed the new behavior does avoid a single It would seem the improvement over both the old and new behavior would to ignore the counter and force a cleanup on program exit if it has still been left in the started state. I'll add those changes. |
The old code has race condition in it too. The check in |
Ok see how you feel about this one instead. I'm not a huge fan of the boolean traps added to startup()/cleanup(). For cleanup() I could make it an internal flag in the class that the destructor sets, if you prefer. |
srtcore/api.cpp
Outdated
if (m_iInstanceCount == 0) | ||
return 0; | ||
|
||
if (--m_iInstanceCount > 0) | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be bound.
if (m_iInstanceCount == 0) | |
return 0; | |
if (--m_iInstanceCount > 0) | |
return 0; | |
if (m_iInstanceCount == 0 || --m_iInstanceCount > 0) | |
return 0; |
srtcore/api.cpp
Outdated
@@ -233,15 +228,24 @@ string srt::CUDTUnited::CONID(SRTSOCKET sock) | |||
return os.str(); | |||
} | |||
|
|||
int srt::CUDTUnited::startup() | |||
void srt::CUDTUnited::implicitStartup() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you have a parameter in startup()
, this function is no longer necessary. You can call it as startup(true)
in socket and group creation functions.
Ok, now this looks definitely better. I would keenly have some unit tests for this, though; I'll try to add something myself. For this these functions would have to return some status value, which simply states whether something was executed or not. I'll try to figure out something myself; we'd need also a function that can report the current number of instances, but should be able to use the CUDTClass directly, not the API. |
Ok, I made a new #3104 as draft so that you can take the changes there. Things that I reported in review comments are applied, and the test is added. I marked places that should be changed for 1.6.0, but for the 1.5.5 they must stay the same to maintain the rules of the API, so this is only marked, and changes would be then provided in a separate PR for 1.6.0. And the test is also there. Please take a look and you can also get these changes to apply here - just add ".patch" to the URI. |
Ok thanks, I've re-pushed your patch. Appreciate the feedback and the patch |
Ok, that comment I reported I added it - forgot about that myself ;). |
Ok, something's wrong, the tests on Travis fail. On my draft PR they fail either. Need to see into this. |
Alright, I found the problem. there are two additional changes in the tests. One of them just performs one testing step in the text that HAPPENS NEXT to check if the PREVIOUS TEST did the job right ;). Unfortunately there's no way to check if the global destructor has run correctly because it happens at the end of the testing application itself. Can you please take the updates from #3104? |
Ok small update: only RIGHT NOW I was able to make all tests pass also in Travis. Some tests rely on doing some things in timely manner and irregularly working machines can fail there, including in cases when an error is expected. |
Change how the implicit startup is tracked. The extra change is needed since the startup() calls in srt::CUDT::socket() and srt::CUDT::createGroup would endlessly increment the startup counter as sockets were created otherwise. Force a clean() when the singleton is getting destructed. Previously it just did a single decrement of the instance count, which could result in attempting to exit without actually cleaning up. fixes Haivision#3098
Ok re-pushed, thanks for your edits! |
if (m_bGCStatus) | ||
|
||
if (implicit && m_iInstanceCount > 0) | ||
// [reinstate in 1.6.0] return m_iInstanceCount; |
Check notice
Code scanning / CodeQL
Commented-out code Note
return 1; | ||
|
||
if (m_iInstanceCount++ > 0) | ||
// [reinstate in 1.6.0] return m_iInstanceCount; |
Check notice
Code scanning / CodeQL
Commented-out code Note
// [reinstate in 1.6.0] return m_iInstanceCount; | ||
return 1; | ||
|
||
if (m_bGCStatus) |
Check notice
Code scanning / CodeQL
Commented-out code Note
return 0; | ||
if (!force) | ||
{ | ||
if (m_iInstanceCount == 0 || --m_iInstanceCount > 0) |
Check notice
Code scanning / CodeQL
Commented-out code Note
Change how the implicit startup is tracked.
The extra change is needed since the startup() calls in
srt::CUDT::socket()
and
srt::CUDT::createGroup
would endlessly increment the startup counter as sockets were created
otherwise.
Force a clean() when the singleton is getting destructed.
Previously it just did a single decrement of the instance count,
which could result in attempting to exit without actually cleaning up.
fixes #3098