-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct System.nanotime() value after restore #53
Conversation
👋 Welcome back rvansa! A progress list of the required criteria for merging this PR into |
@rvansa This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 12 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
src/hotspot/share/runtime/os.cpp
Outdated
if (diff_millis < 0) { | ||
diff_millis = 0; | ||
} | ||
javaTimeNanos_offset = checkpoint_nanos - javaTimeNanos() + diff_millis * 1000000L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain why isn't javaTimeNanos_offset = checkpoint_nanos - javaTimeNanos()
sufficient? What am I missing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashu-mehra Had we ignored the diff_millis
part, the diff of nanotime before and after checkpoint would indicate that no time elapsed between checkpoint and restore. With this in place the difference does not have the expected accuracy but at least it's monotonic; wall clock time difference gives us the best possible estimate.
I understand this change is trying to adjust the return value of any calls made to System.nanoTime() after restore to take into account the elapsed time between checkpoint and restore. [0] https://man7.org/linux/man-pages/man3/clock_gettime.3.html |
@ashu-mehra The main point of this change is not about whether the time being suspended should be observed or not; I am rather worried about moving the process to another system and getting totally nonsense results from nanotime diffs, and broken code. I understand that observing the suspended can be a subject to further discussion, though I incline towards the visibility of such interval, as implemented here. Since this fixes some use cases and does not change what wouldn't be broken (on a single system the paused time with system running would be observed anyway unless the whole machine was suspended) I suggest to merge this as-is without considering the topic resolved forever. The fact that some timers use this as the time source rather than wall clock time is an implementation detail. Applications performing checkpoint and restore will require some tweaks to perform correctly and I intend to work on ways to deal with timers. |
@rvansa I am assuming you are seeing time going backwards with System.nanoTime() calls after restore. Is that correct? Can you please provide more details about your observation on nonsense results. Does the system where the absurd behavior is observed support time namepace ? Which version of criu did you use? |
@ashu-mehra To be honest I didn't know about this being handled in CRIU - I thought that the offsets can be set only once in the newly created namespace. Does CRIU create another ns for the process? I have not observed the problem in practice, rather anticipated it as some while ago I ran into similar problem with GraalVM inlining nanotime in a static final variable (caused 100% CPU usage on some machines and normal behaviour on others). I shall rerun the enclosed test without the fix applied but I think it was failing. |
If I am reading it correctly the criu does seem to create new time ns [0]. [0] https://github.com/checkpoint-restore/criu/blob/7977416ca821b4895fd5dbc498eb95f7e77f2ede/criu/timens.c#L79 |
Thanks for those pointers. I've checked that the NanoTimeTest fails without the 'fix' in place; The dump creates
and I can't find log entry that would report this:
Any clues why this is not applied? |
what is the criu version you are on? |
I am using this one: https://github.com/CRaC/criu/releases/tag/release-1.3
|
Crac-criu does not use restore timens [1] since once a bug in kernel or criu caused timedwait to return immediatelly everytime that is called after restore. I don't remember the bug exactly (already fixed), but I believe it was discussed on this maillist. In general, we should not to depend on very obscure linux abillities, as this reduce chances we'd be able to run on something rather than linux. Having the code in the JVM provides better control for the implementation the java spec. |
CRaC/criu@1cb2f4a commit is dated July 14, 2020, but the crac-dev archives has earliest mailing list from Sept 2021. Is there some other mailing list this was discussed on? I am interested in understanding the problem that prompted not to use timens in criu.
I don't think timens can be put in the category of obscure linux ability. It has even made its way into container runtime spec: opencontainers/runtime-spec#1151. |
@jankratochvil After trying to run C/R with fewer privileges in containers I am starting to see the value of moving away from CRIU; it does more than JVM usually needs, but this requires elevated privileges. Leaving it all at once would be a tremendous endeavour, though, so slowly dropping functionality, even through adding some switches into CRaC fork to disable some parts makes sense to me. |
@jankratochvil I am going to need some extra info on that hang; I have added some tests involving negative boottime (although bootime should not really change anything) and in the end rebooted my machine to test this out as you desribed but haven't had any issues. I also need to check what's wrong with the test in CI since I can't reproduce the failure locally. |
Actually thanks to rebooting I was able to run into the issue CI encountered - using negative monotonic offset could end up setting the time to negative value. I've fixed this by shifting the monotonic offset for the checkpoint rather than for restore in this case. |
Thanks for the comments on boot ID @jankratochvil , incorporated. |
@jankratochvil The problem with 'hanging' restore is actually not specific to Fedora at all; from the description I thought that it is unresponsive in native part, but While I consider this a serious issue (and I'll be thinking about a solution), I think that it's independent from this PR - this handles the value we're reading directly through |
@jankratochvil I have a fix for the sleepers in https://github.com/rvansa/crac/tree/timed_wait , will publish it as PR once this gets integrated. |
@AntonKozlov Anything more needed for this PR? |
Hi @AntonKozlov was this only off your radar or is there anything that still needs consideration? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, LGTM!
/integrate |
Going to push as commit a8e69de.
Your commit was automatically rebased without conflicts. |
There are various places both inside JDK and in libraries that rely on monotonicity of
System.nanotime()
. When the process is restored on a different machine the value will likely differ as the implementation provides time since machine boot. This PR records wall clock time before checkpoint and after restore and tries to adjust the value provided by nanotime() to reasonably correct value.Progress
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/crac.git pull/53/head:pull/53
$ git checkout pull/53
Update a local copy of the PR:
$ git checkout pull/53
$ git pull https://git.openjdk.org/crac.git pull/53/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 53
View PR using the GUI difftool:
$ git pr show -t 53
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/crac/pull/53.diff
Webrev
Link to Webrev Comment