-
-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upsmon fails to start if the old pid happens to exist for an unrelated process #2463
Comments
Thanks, that's an interesting case, although the hard part can be looking up the process name for a PID across many platforms that all do it differently. Just in case: is the problem happening at router start-up? Does your build of NUT there store the PID files in a persistent storage location, or in a tmpfs that should be empty after a reboot? Also, which version of NUT is involved? I think with recent ones, a graceful exit of the daemon should have it delete the PID file. |
These are the entware packages installed on the router so a pretty recent version.
The pid file is in /opt/var/run which is persistent. The /opt/etc/init.d/S15upsmon startup script isn't specifying a variable with the directory for the pid file so not sure how to override this to a non-persistent directory. This problem only happened once after a non-graceful reboot of the router. I kept getting emails about upsmon not running so all subsequent attempts to start upsmon were also failing. Since the entware startup scripts already perform a check of an already running process (below), perhaps a upsmon command-line argument to skip the redundant test at upsmon startup could be a workaround.
This is admittedly a fringe case that requires bad luck to occur. |
For temporary files (PID, Unix socket, etc.), NUT uses generally several locations which are built into the binaries by |
Thanks, those variables are very useful. Tweaking the entware scripts makes installing future entware package updates a pain. If there was a portable fix I would log an issue with entware so all entware NUT users would avoid the risk. Even a non-persistent upsmon.pid isn't a perfect solution. For now I'll just keep my checkservices cron job hack which deletes upsmon.pid the next time it runs after the problem happens and notifies me. I might get lucky and never hit this issue again. This is a apparently a bigger topic than this NUT issue. |
Well, in case of NUT the PID files are also important for inter-process communications of sorts, such as sending signals to already-running daemons (e.g. to |
In the entware distribution of upsmon the -p argument is the only one used so everything is "all root all the time" so sending signals to a a random daemon will succeed. My router only has root so that's a rare case where -p makes sense. Could an argument tell upsmon to run pidof from the OS to double check the PID rather than only consulting the PID file? The entware startup scripts use pidof before launching anything so they already assume all the platforms they support have pidof so entware might consider using such an option especially since using -p. Just brainstorming crazy ideas on how to never confuse a random process for a NUT program. I'm no expert... |
Adding a dependency on a random program is not a likely way forward. I'm tinkering to check with how |
…, and helper parseprogbasename() [networkupstools#2463] Signed-off-by: Jim Klimov <[email protected]>
…ignal*() via old PID only to same progname [networkupstools#2463] Internal API change for common.c/h Signed-off-by: Jim Klimov <[email protected]>
…ne parsing [networkupstools#2463] Signed-off-by: Jim Klimov <[email protected]>
…parsing [networkupstools#2463] Signed-off-by: Jim Klimov <[email protected]>
I think I ended up making an universal |
Looked at the pull request, very nice... Makes investigating the cause and logging an issue worthwhile. |
Not sure what you meant about using In the https://github.com/Entware/entware-packages/blob/master/net/nut/files/nut-monitor.init#L14-L16 script I see them setting up And in https://github.com/Entware/entware-packages/blob/master/net/nut/files/nut-monitor.init#L207 they start it foregrounded (with debug verbosity 1), no Do you have a similar version deployed? The one I see in their Git is 2 years old, related to OpenWRT 2022.04... But then there's also https://github.com/Entware/entware-packages/blob/master/net/nut/files/S15upsmon with |
Yes I have entware deployed on a few routers. The main one where the pid incident occurred has a S15upsmon identical to the third link you posted. That's where I saw the
I don't have an entware build environment and know little about their git. They tend to be slow with updating their packages to the latest versions. Is there something you want tested on my backup router? |
…ng without a /proc [networkupstools#2463] Signed-off-by: Jim Klimov <[email protected]>
…ols#2463] Signed-off-by: Jim Klimov <[email protected]>
…o (Solaris/illumos) parsing [networkupstools#2463] Signed-off-by: Jim Klimov <[email protected]>
…ithout a /proc [networkupstools#2463] Signed-off-by: Jim Klimov <[email protected]>
…sage [#2463] Signed-off-by: Jim Klimov <[email protected]>
Well, if you were able to build a newer package to install and run on your router - that would be great. Other than that, I think I can only suggest asking in the Entware community about how they have these different init scripts, and the newer one looks more advanced but an older one is actually used (at least in your platform's builds), and how to rectify that... |
I successfully build and run released upsmon but what would I put in this file so the Entware build system downloads pre-release NUT tar.gz and tar.gz.sha256 files? |
Good question, as we do not really publish pre-release tarballs for interim branches (or even states of master) - that would need too much storage. The good news is that on a sufficiently prepared (tools and third-party deps) you can make your own such tarball with Also a tarball can be left over from |
…roblem [networkupstools#2463] Signed-off-by: Jim Klimov <[email protected]>
…ame validation if it causes problems [networkupstools#2463] Signed-off-by: Jim Klimov <[email protected]>
…ocname() vs current getpid() [networkupstools#2463] Signed-off-by: Jim Klimov <[email protected]>
…rkupstools#2463, networkupstools#2478] Signed-off-by: Jim Klimov <[email protected]>
…rkupstools#2463, networkupstools#2478] Signed-off-by: Jim Klimov <[email protected]>
…rom checkprocname() [networkupstools#2463] Signed-off-by: Jim Klimov <[email protected]>
…ed() from checkprocname() and compareprocname() [networkupstools#2463] Signed-off-by: Jim Klimov <[email protected]>
…red() and getprocname(pid) once for several tests against its value, and report it in the end [networkupstools#2463] Signed-off-by: Jim Klimov <[email protected]>
…te typo) [networkupstools#2463] Signed-off-by: Jim Klimov <[email protected]>
…eloper builds [networkupstools#2463] Signed-off-by: Jim Klimov <[email protected]>
If a previous upsmon.pid file is present at startup and an unrelated process by chance happens to exist with that pid, upsmon will fail to start with the following message:
Workaround:
My cron job that that checks running services using pidof will now delete upsmon.pid before attempting to restart upsmon.
Expected behavior:
If the name of the current process with the pid contained in upsmon.pid is unrelated to upsmon, then upsmon should start up normally instead of exiting with a failure.
Platform: router running linux
The text was updated successfully, but these errors were encountered: