Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct System.nanotime() value after restore #53

Closed
wants to merge 23 commits into from

Conversation

rvansa
Copy link
Collaborator

@rvansa rvansa commented Mar 23, 2023

There are various places both inside JDK and in libraries that rely on monotonicity of System.nanotime(). When the process is restored on a different machine the value will likely differ as the implementation provides time since machine boot. This PR records wall clock time before checkpoint and after restore and tries to adjust the value provided by nanotime() to reasonably correct value.


Progress

  • Change must not contain extraneous whitespace

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/crac.git pull/53/head:pull/53
$ git checkout pull/53

Update a local copy of the PR:
$ git checkout pull/53
$ git pull https://git.openjdk.org/crac.git pull/53/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 53

View PR using the GUI difftool:
$ git pr show -t 53

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/crac/pull/53.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 23, 2023

👋 Welcome back rvansa! A progress list of the required criteria for merging this PR into crac will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 23, 2023

@rvansa This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

Correct System.nanotime() value after restore

Reviewed-by: akozlov

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 12 new commits pushed to the crac branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the crac branch, type /integrate in a new comment.

@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 23, 2023
@mlbridge
Copy link

mlbridge bot commented Mar 23, 2023

Webrevs

if (diff_millis < 0) {
diff_millis = 0;
}
javaTimeNanos_offset = checkpoint_nanos - javaTimeNanos() + diff_millis * 1000000L;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain why isn't javaTimeNanos_offset = checkpoint_nanos - javaTimeNanos() sufficient? What am I missing here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashu-mehra Had we ignored the diff_millis part, the diff of nanotime before and after checkpoint would indicate that no time elapsed between checkpoint and restore. With this in place the difference does not have the expected accuracy but at least it's monotonic; wall clock time difference gives us the best possible estimate.

@ashu-mehra
Copy link
Contributor

I understand this change is trying to adjust the return value of any calls made to System.nanoTime() after restore to take into account the elapsed time between checkpoint and restore.
In principle this idea is very similar to CLOCK_BOOTTIME [0] which takes into account the time system has spent in suspend state.
I came across an issue [1] in golang which was suggested to replace CLOCK_MONOTONIC with CLOCK_BOOTTIME but it was considered ill-advised and was closed. There was even a linux kernel patch [2] to make CLOCK_MONOTONIC behave as CLOCK_BOOTTIME which was reverted [3] immediately because it broke many of the existing userland softwares.
Considering this precedent, I think we should also consider the impact of this change on existing frameworks and libraries, that is, can this change break the existing code patterns that use System.nanoTime()?

[0] https://man7.org/linux/man-pages/man3/clock_gettime.3.html
[1] golang/go#24595
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d6ed449afdb38f89a7b38ec50e367559e1b8f71f
[3] https://www.spinics.net/lists/linux-tip-commits/msg43709.html

@rvansa
Copy link
Collaborator Author

rvansa commented Mar 28, 2023

@ashu-mehra The main point of this change is not about whether the time being suspended should be observed or not; I am rather worried about moving the process to another system and getting totally nonsense results from nanotime diffs, and broken code.

I understand that observing the suspended can be a subject to further discussion, though I incline towards the visibility of such interval, as implemented here. Since this fixes some use cases and does not change what wouldn't be broken (on a single system the paused time with system running would be observed anyway unless the whole machine was suspended) I suggest to merge this as-is without considering the topic resolved forever.

The fact that some timers use this as the time source rather than wall clock time is an implementation detail. Applications performing checkpoint and restore will require some tweaks to perform correctly and I intend to work on ways to deal with timers.

@ashu-mehra
Copy link
Contributor

I am rather worried about moving the process to another system and getting totally nonsense results from nanotime diffs, and broken code.

@rvansa I am assuming you are seeing time going backwards with System.nanoTime() calls after restore. Is that correct?
Its interesting if you are seeing such absurd results after restore, because IIUC criu is using time namespace to update the clock offsets in /proc/<pid>/timens_offsets, so I wouldn't expect System.nanoTime() to give absurd results on restore.
I did some tests with C program that call clock_gettime(CLOCK_MONOTONIC) before and after restore. In between I restarted my system. I didn't see time going backwards; all I could observe is that the elapsed time between checkpoint and restore was not taken into account.

Can you please provide more details about your observation on nonsense results. Does the system where the absurd behavior is observed support time namepace ? Which version of criu did you use?

@rvansa
Copy link
Collaborator Author

rvansa commented Mar 28, 2023

@ashu-mehra To be honest I didn't know about this being handled in CRIU - I thought that the offsets can be set only once in the newly created namespace. Does CRIU create another ns for the process?

I have not observed the problem in practice, rather anticipated it as some while ago I ran into similar problem with GraalVM inlining nanotime in a static final variable (caused 100% CPU usage on some machines and normal behaviour on others). I shall rerun the enclosed test without the fix applied but I think it was failing.

@ashu-mehra
Copy link
Contributor

Does CRIU create another ns for the process?

If I am reading it correctly the criu does seem to create new time ns [0].
The complete patch can be seen here [1].

[0] https://github.com/checkpoint-restore/criu/blob/7977416ca821b4895fd5dbc498eb95f7e77f2ede/criu/timens.c#L79
[1] checkpoint-restore/criu@4127ef4

@rvansa
Copy link
Collaborator Author

rvansa commented Mar 29, 2023

Thanks for those pointers. I've checked that the NanoTimeTest fails without the 'fix' in place; The dump creates timens-0.img (35 bytes) but it looks like this is not used later on; when I run restore with -v4 I see these log messages:

(00.003124) Reading image tree
(00.003138) Add mnt ns 6 pid 8
(00.003139) Add net ns 2 pid 8
(00.003140) Add pid ns 1 pid 8
(00.003141) pstree pid_max=8
(00.003143) Will restore in 0 namespaces
(00.003144) NS mask to use 0

and I can't find log entry that would report this:

pr_debug("timens: monotonic %ld %ld\n", ts.tv_sec, ts.tv_nsec);

Any clues why this is not applied?

@ashu-mehra
Copy link
Contributor

what is the criu version you are on?

@rvansa
Copy link
Collaborator Author

rvansa commented Mar 30, 2023

I am using this one: https://github.com/CRaC/criu/releases/tag/release-1.3

Version: 3.17.1-crac
GitID: 8926431

@AntonKozlov
Copy link
Member

Crac-criu does not use restore timens [1] since once a bug in kernel or criu caused timedwait to return immediatelly everytime that is called after restore. I don't remember the bug exactly (already fixed), but I believe it was discussed on this maillist.

In general, we should not to depend on very obscure linux abillities, as this reduce chances we'd be able to run on something rather than linux. Having the code in the JVM provides better control for the implementation the java spec.

[1] CRaC/criu@1cb2f4a

src/hotspot/share/runtime/os.cpp Outdated Show resolved Hide resolved
src/hotspot/share/runtime/os.cpp Outdated Show resolved Hide resolved
@ashu-mehra
Copy link
Contributor

@AntonKozlov

Crac-criu does not use restore timens [1] since once a bug in kernel or criu caused timedwait to return immediatelly everytime that is called after restore. I don't remember the bug exactly (already fixed), but I believe it was discussed on this maillist

CRaC/criu@1cb2f4a commit is dated July 14, 2020, but the crac-dev archives has earliest mailing list from Sept 2021. Is there some other mailing list this was discussed on? I am interested in understanding the problem that prompted not to use timens in criu.
Since you mention it was a bug in kernel or criu and it has been almost 3 years since your commit, may be it is worth enabling the criu changes again to see if the timedwait problem still exists, unless you have already done that.

In general, we should not to depend on very obscure linux abillities, as this reduce chances we'd be able to run on something rather than linux.

I don't think timens can be put in the category of obscure linux ability. It has even made its way into container runtime spec: opencontainers/runtime-spec#1151.

@rvansa
Copy link
Collaborator Author

rvansa commented May 12, 2023

@jankratochvil After trying to run C/R with fewer privileges in containers I am starting to see the value of moving away from CRIU; it does more than JVM usually needs, but this requires elevated privileges. Leaving it all at once would be a tremendous endeavour, though, so slowly dropping functionality, even through adding some switches into CRaC fork to disable some parts makes sense to me.

@rvansa
Copy link
Collaborator Author

rvansa commented May 12, 2023

@jankratochvil I am going to need some extra info on that hang; I have added some tests involving negative boottime (although bootime should not really change anything) and in the end rebooted my machine to test this out as you desribed but haven't had any issues.

I also need to check what's wrong with the test in CI since I can't reproduce the failure locally.

@rvansa
Copy link
Collaborator Author

rvansa commented May 12, 2023

Actually thanks to rebooting I was able to run into the issue CI encountered - using negative monotonic offset could end up setting the time to negative value. I've fixed this by shifting the monotonic offset for the checkpoint rather than for restore in this case.

@rvansa
Copy link
Collaborator Author

rvansa commented May 16, 2023

Thanks for the comments on boot ID @jankratochvil , incorporated.

@rvansa
Copy link
Collaborator Author

rvansa commented May 18, 2023

@jankratochvil The problem with 'hanging' restore is actually not specific to Fedora at all; from the description I thought that it is unresponsive in native part, but jstack tells that it was your testcase sleeping in Thread.sleep(). Turns out os::PlatformEvent::park creates an absolute timestamp on monotonic time and calls pthread_cond_timedwait - when the time runs backward during restore this does not get reevaluated. Based on the codepath I think that this would not be limited to Thread.sleep() but many other 'waiting' synchronization primitives.

While I consider this a serious issue (and I'll be thinking about a solution), I think that it's independent from this PR - this handles the value we're reading directly through System.nanoTime(). I'll also test whether it happens when timens is used by CRIU.

@rvansa
Copy link
Collaborator Author

rvansa commented May 19, 2023

@jankratochvil I have a fix for the sleepers in https://github.com/rvansa/crac/tree/timed_wait , will publish it as PR once this gets integrated.
Also, your trouble with libbsd should be resolved once CRaC/container-images#4 gets integrated.

@rvansa
Copy link
Collaborator Author

rvansa commented Jun 5, 2023

@AntonKozlov Anything more needed for this PR?

@rvansa
Copy link
Collaborator Author

rvansa commented Jun 16, 2023

Hi @AntonKozlov was this only off your radar or is there anything that still needs consideration?

Copy link
Member

@AntonKozlov AntonKozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, LGTM!

@rvansa
Copy link
Collaborator Author

rvansa commented Jun 19, 2023

/integrate

@openjdk
Copy link

openjdk bot commented Jun 19, 2023

Going to push as commit a8e69de.
Since your change was applied there have been 12 commits pushed to the crac branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 19, 2023
@openjdk openjdk bot closed this Jun 19, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 19, 2023
@openjdk
Copy link

openjdk bot commented Jun 19, 2023

@rvansa Pushed as commit a8e69de.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants