boot_time() is wrong #658

MauroMombelli · 2015-07-18T14:55:54Z

hi,

on a system without HW clock (raspberry) the boot_time() give 45 years (since epoch), while "uptime" command from shell give correct results.

giampaolo · 2015-07-18T15:46:39Z

What OS is that? Linux?

MauroMombelli · 2015-07-18T20:18:00Z

sorry, yes, arch linux arm, updated to latest version

giampaolo · 2015-07-31T12:14:04Z

Please paste the output of cat /proc/stat.

MauroMombelli · 2015-07-31T13:10:28Z

cat /proc/stat
cpu 5896703 6291 435376 40166914 949451 381 55053 0 0 0
cpu0 5896703 6291 435376 40166914 949451 381 55053 0 0 0
intr 169712969 0 0 0 33510989 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 394073 158525 0 0 0 0 0 0 133113018 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 2 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 3 2536346 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 92586406
btime 1437869567
processes 7898
procs_running 1
procs_blocked 0
softirq 29134775 5943048 9746842 95726 2945587 142 0 7961860 0 22627 2418943

giampaolo · 2015-08-01T12:17:00Z

Uhm... that btime 1437869567 line looks correct:

>>> import datetime
>>> datetime.datetime.fromtimestamp(1437869567.0).strftime("%Y-%m-%d %H:%M:%S")
'2015-07-26 02:12:47'

Maybe your system clock was messed up?

MauroMombelli · 2015-08-02T09:10:29Z

the system clock is messed up because the raspi does not have the HW clock, so it start as timestamp=0 and then NTP set it to correct value.
Now i can't replicate the problem, but i played a bit with NTP and systemd settings; and as i don't have physical access to the machine i won't play with it again :/

giampaolo · 2015-08-07T10:07:12Z

Since this is no longer reproducible I'm gonna close it for now. Feel free to reopen in case you bump into this issue again.

Gui13 · 2020-02-12T19:34:05Z

I have a pretty easy way to reproduce it here with telegraf, which uses the gopsutil library, which is pretty much a word for word rewrite of your library:

influxdata/telegraf#7018

I have added a bug at gopsutil but your bug is pretty much the same: launch the python script before NTP does its magic, then wait until NTP corrects the date, and then enjoy your 40+ years uptime :D

giampaolo · 2020-02-12T20:08:39Z

Mmm... I guess that means that /proc/stat doesn't get updated for some reason.
You should check whether also /proc/uptime is updated or not. If it's updated we can use that one.

Gui13 · 2020-02-12T20:55:10Z

This is what I get for a problematic Rpi:

pi@eanode5:~ $ cat /proc/uptime
11779.00 45685.18
pi@eanode5:~ $ cat /proc/stat
cpu  9382 37 8924 4689169 923 0 143 0 0 0
cpu0 2066 0 2431 1171940 258 0 83 0 0 0
cpu1 2058 2 2132 1173105 141 0 20 0 0 0
cpu2 2263 3 2249 1172836 276 0 18 0 0 0
cpu3 2995 32 2112 1171286 247 0 22 0 0 0
intr 952432 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6690 7917 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 729 0 3763 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 269 7179 0 0 0 0 90339 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 526853 0 0 0
ctxt 1400316
btime 1581528798
processes 15829
procs_running 1
procs_blocked 0
softirq 811644 0 263970 2039 19548 0 0 86407 266106 0 173574
pi@eanode5:~ $ date +%s
1581540580

If you do the math, it actually makes sense, all the numbers are correct. btime has been adjusted by the NTP.

The issue is that you retrieve btime when your module is first instanciated and keep it for the rest of the duration of the process, so if the btime changes, you will do computations on an older value.

giampaolo · 2020-02-12T21:38:00Z

The issue is that you retrieve btime when your module is first instanciated and keep it for the rest of the duration of the process, so if the btime changes, you will do computations on an older value.

That is true, Process.create_time() on Linux relies on a cached version of whatever psutil.boot_time() returned the last time it was called.
Maybe this should change.
The original issue was about psutil.boot_time() though.

giampaolo · 2020-02-12T23:11:33Z

There is another flip of this coin which is probably worth mentioning.
Process.create_time() is different than other methods because it's value is set on __init__ in order to identify the process instance uniquely over time. This way we avoid killing the wrong process if its PID has been reused.
If the clock changes and create_time() returns something different, that Process instance will probably raise NoSuchProcess("this PID has been reused by another process") on kill(), terminate() and other "write" methods (set ionice() or whatever). Probably it's an edge case, and it's either that or live with process ctimes 40 years into the future.

The cost of giving up the cache and return the right process creation time after a system clock update is a 2x slowdown when creating a Process instance (after that the result is cached):

~/svn/psutil {master}$ python3 -m timeit -s "import psutil" "psutil.Process()"
10000 loops, best of 3: 25.2 usec per loop
~/svn/psutil {master}$ python3 -m timeit -s "import psutil" "psutil.Process()"
10000 loops, best of 3: 56.2 usec per loop

Gui13 · 2020-04-15T19:01:22Z

I understand the performance issue, but the returned value is still wrong for any system that encountered NTP adjustment during the life of the python process that uses psutil.

As I said in the original bug in telegraf (which uses the python gopsutil but has the same bug), I suspect that many monitoring systems might be minutes or hours off their real uptime, depending on how skewed their RTC is, and how often the NTP process has to compensate for it. Right now the only way to solve the issue is to restart the python process doing the monitoring.

Maybe there is a middle ground where the actual btime is read again at regular intervals (say, at least every 15 minutes) and cached in between?

giampaolo · 2020-04-15T20:12:24Z

I agree this should be fixed. We can resolve the slowdown by getting the boot time by using sysinfo(2) syscall instead of parsing /proc/stat:
https://stackoverflow.com/a/1544090/376587
It's already exposed in psutil, we just have to return the extra struct info. I made a quick test calling sysinfo(2) on create_time() and the slowdown is minimal, so I'd say we can proceed with this:

before:
~/svn/psutil {master}$ python3 -m timeit -s "import psutil" "psutil.Process()"
10000 loops, best of 3: 23.5 usec per loop

after:
~/svn/psutil {master}$ python3 -m timeit -s "import psutil" "psutil.Process()"
10000 loops, best of 3: 25.8 usec per loop

Note to self: this issue is related to #1724.

giampaolo · 2020-04-15T20:15:48Z

Also, another note to self: originally we were already using sysinfo(2) to get the boot_time(), but decided to use /proc/stat instead so one can realiably determine the boot time of a Docker instance (when running psutil inside docker). There's a ticket somewhere but I'm lazy. Maybe it makes sense to rely on /proc/stat if we're running inside Docker, else use sysinfo(2).

giampaolo · 2020-04-15T22:16:27Z

sysinfo(2) gives the number of seconds since boot, which changes on every call, and that is not good as we have to do some math in order to calculate it, and that may lead to a variation of result between calls. We have a similar problem on Windows, which I solved with this hack:

psutil/psutil/_pswindows.py

Lines 435 to 440 in 7ce6c3c

    
           def boot_time(): 
        
               """The system boot time expressed in seconds since the epoch.""" 
        
               # This dirty hack is to adjust the precision of the returned 
        
               # value which may have a 1 second fluctuation, see: 
        
               # https://github.com/giampaolo/psutil/issues/1007 
        
               global _last_btime

It appears that uptime -s returns a stable time, but still not sure how. This should be the source code:
https://github.com/coreutils/coreutils/blob/master/src/uptime.c

Gui13 · 2020-04-16T06:45:53Z

@giampaolo if sysinfo returns the corrected value after an adjustment, then it makes sense yes.

If what is said in #1724 is right, you can bypass the problem for the process isRunning() issue by reading the /proc/PID/stat instead of basing on /proc/stat

Then the last thing to check is that in VM/docker environment, the sysinfo call correctly adjusts (but my guess is that yes, it adjusts, since in VM it is like in real machine and in docker it actually responds what the host kernel knows)

Gui13 · 2020-04-16T06:47:49Z

Ah my bad, sysinfo doesn't return the btime. Well then /proc/stat would be the solution.. maybe with some caching so that you don't re-read it all the time?

giampaolo · 2020-04-16T09:18:02Z

Well then /proc/stat would be the solution.. maybe with some caching so that you don't re-read it all the time?

The caching is why we have this problem currenly: since boot_time() value is cached (when using it from Process.create_time()) we don't get NTP updates.

Gui13 · 2020-04-16T15:38:50Z

Yeah but it might be sufficient to read it at regular intervals (every N calls, or store the last read timestamp and read every 15 minutes). The problem lies in reading it only once. If the uptime is eventually consistent, then it might be OK.

I'm just suggesting this because you mentioned the performance hit, the real, irrefutable fix is to read /proc/stat every time.

giampaolo closed this as completed Aug 7, 2015

giampaolo reopened this Feb 12, 2020

Lomanic mentioned this issue Apr 15, 2020

Uptime can jump when date is adjusted for devices without RTC shirou/gopsutil#837

Closed

giampaolo added the bug label Nov 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

boot_time() is wrong #658

boot_time() is wrong #658

MauroMombelli commented Jul 18, 2015

giampaolo commented Jul 18, 2015

MauroMombelli commented Jul 18, 2015

giampaolo commented Jul 31, 2015

MauroMombelli commented Jul 31, 2015

giampaolo commented Aug 1, 2015

MauroMombelli commented Aug 2, 2015

giampaolo commented Aug 7, 2015

Gui13 commented Feb 12, 2020

giampaolo commented Feb 12, 2020

Gui13 commented Feb 12, 2020

giampaolo commented Feb 12, 2020 •

edited

Loading

giampaolo commented Feb 12, 2020 •

edited

Loading

Gui13 commented Apr 15, 2020

giampaolo commented Apr 15, 2020 •

edited

Loading

giampaolo commented Apr 15, 2020 •

edited

Loading

giampaolo commented Apr 15, 2020 •

edited

Loading

Gui13 commented Apr 16, 2020

Gui13 commented Apr 16, 2020

giampaolo commented Apr 16, 2020 •

edited

Loading

Gui13 commented Apr 16, 2020

boot_time() is wrong #658

boot_time() is wrong #658

Comments

MauroMombelli commented Jul 18, 2015

giampaolo commented Jul 18, 2015

MauroMombelli commented Jul 18, 2015

giampaolo commented Jul 31, 2015

MauroMombelli commented Jul 31, 2015

giampaolo commented Aug 1, 2015

MauroMombelli commented Aug 2, 2015

giampaolo commented Aug 7, 2015

Gui13 commented Feb 12, 2020

giampaolo commented Feb 12, 2020

Gui13 commented Feb 12, 2020

giampaolo commented Feb 12, 2020 • edited Loading

giampaolo commented Feb 12, 2020 • edited Loading

Gui13 commented Apr 15, 2020

giampaolo commented Apr 15, 2020 • edited Loading

giampaolo commented Apr 15, 2020 • edited Loading

giampaolo commented Apr 15, 2020 • edited Loading

Gui13 commented Apr 16, 2020

Gui13 commented Apr 16, 2020

giampaolo commented Apr 16, 2020 • edited Loading

Gui13 commented Apr 16, 2020

giampaolo commented Feb 12, 2020 •

edited

Loading

giampaolo commented Feb 12, 2020 •

edited

Loading

giampaolo commented Apr 15, 2020 •

edited

Loading

giampaolo commented Apr 15, 2020 •

edited

Loading

giampaolo commented Apr 15, 2020 •

edited

Loading

giampaolo commented Apr 16, 2020 •

edited

Loading