-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating sub-processes using 100% CPU when content is added #165
Comments
Sorry to hear this old problem (if the same) seems to persist. Not heard any reports of this in a very long time. What I need from you to even have the slightest chance to find the root cause is a gdb stack dump from a process that is stuck at 100% load. You would attach gdb to the pid and then dump the stack using the 'bt' command. You might need to rebuild rar2fs using '--enable-debug' in case the stack trace lacks any readable symbols and run the non-stripped binary from the 'src' directory. Thus not the one installed by 'make install'. |
If you wish to test some other version I think you need to go back 4-5 years to v1.23.1. This is when pipes were replaced with conditional variables. It is the only thing I can think of that would perhaps cause a regression like this. |
Thank you so much for the quick answer! As I'm having the issues after starting to update to newer versions of rar2fs as soon as possible I'll try your suggeestion of using the older version first. I'm kind of overwhelmed by your first comment as I'm not exactly sure what you're askin me to do! :) I'll get back after some testing with the older version. Thanks for this great software! Regards |
So I tried to install the version you mentioned and ended up with an error as in #85. After applying the patch and make gets to ./unrar I get this error:
I guess it has something to do with the unrar version? Tried 6.0.7 and 5.5.6 for compiling while 5.5.0 is installed on the system. I'm happy to try the things you said to do with the most recent version of rar2fs, as it seems to have some fiddling involved anyway. If you just could clarifiy what to do exactly I'm happy to do so. Regards |
What version did you use before you hit this problem? |
I was using v1.29.5. |
But I was under the impression you hit this problem when moving to 1.29.5 so what version did you upgrade from? |
I tried to downgrade from 1.29.5 to 1.23.1 and ended up with those errors. So currently I'm still using 1.29.5 with unrar 5.50 because installing 1.23.1 didn't work. After remounting with 1.29.5 and unrar 5.50 everything is working right now. It's kind of hard for me to reproduce the CPU load failure as it could take days for it to occur again. Maybe you could explain what you meant with:
I'm happy to do this as soon as the CPU load issue comes up again! I could try to figure it out by myself with stackoverflow, but that could take ages! ;) |
Ok my only comment was that when did you start seeing this issue? You moved to 1.29.5 at some point right? So before that what version were you using since you did not report this problem until now? I will get back to you about details on how to attach gdb to a running process and dump the stack. |
Ahh, sorry I didn't get that. I figured it might have something to do with my server config so I tried other stuff before. rar2fs worked flawlessly for years and wouldn't have thought it mighty break like this. Sadly I don't know exatcly. I would guess at least since 1.29.0 |
Easiest way to attach gdb would be
However since rar2fs spawn many processes/threads the best way is to use the 'ps' or 'top' command to find the exact process id that is using a lot of CPU, and then
Once in gdb you will get a prompt 'gdb>' and and if input cannot be given do CTRL-C to interrupt it and then run the gdb 'bt' command. In addition to this an strace might be useful. So in the same manner run the Linux 'strace' command and log output to some file
Note that strace will not terminate since the process is more than likely stuck in a loop so just let strace run for a few seconds and then terminate it using CTRL-C. Have you been able to reproduce the issue yet? If you manage to find an easy way to do that going back to some previous version like 1.29.4 and confirm it now works, if not step back a few more versions and so on. It would provide valuable input for me to understand what code delta we are dealing with here as well. And sorry for the late reply, I have been rather busy at work recently and have not had a single moment of spare time. |
Absolutely no problem! I would never expect quick answers anyway. It's free software you're maintaining in your free time. I totally understand it doesn't have highest priority. I fell back one major version and now using version 1.28. The plan is to check if everything is working here and slowly work my way up to a version which is not. I have processes running since July 1st, added content and neither had sub-procceses nor high CPU load. So I guess I'll let it run for a week or until problems occur. After that I'll use the commands you provided with a newer version which is causing problems. Thanks again and have a nice weekend! |
So it seems like the problem persists in v 1.28 aswell. I have one process with one sub-process so far (14075 and 14078) using 100% of CPU Here's the output of gdb: Attaching to process 14075 and the output of strace running a couple of seconds: strace -p 14075 > strace.txt I guess it means I have to recompile using --enable-debug? |
Yes I think you should recompile using --enable-debug. But gdb output looks a bit strange. Did you really interrupt using CTRL-C? I don't see the output from bt command. |
Here's the output using the bt command. Not allot more information given. 'Top' now shows a cpu load of 250% for this process because of all of these sub processes. I'll recompile later with --enable-debug. Could you explain how to to that when not using 'make install'?
|
No the problem is that what you see in gdb is the main thread, that is simply sleeping causing 0 CPU. In gdb you need to list the threads (info threads) and change to them one by one using t and do bt. I thought gdb was more clever than this but apparently not. To rebuild using --enable-debug you need to reconfigure project using the ./configure script and then do make, a new binary with all symbols intact will be placed under the src directory. |
Also, you might have attached to the wrong pid. A process that spawns additional threads and/or child processes will all show the same parent pid but different child pids. To check this use
The PID that is now presented in the leftmost column is what you should attach to and strace. |
I hope I recompiled correctly with ./configure --enable-debug && make && make install I killed all processes and remounted with the recompiled version. With your notes about gdb I'm hopefully able to provide some useful information next time! |
Well the problem is that make install will strip the binary. |
Thats what I found on stackoverflow. Well, as I said before, consider me a newbie. |
No problem, just skip |
So I recompiled with ./configure --enable-debug && make and copied the just compiled binary from src where the old one was (in my case /user/local/bin). I'll get back to you as soon as I have some information! |
So this issues adressed in #11 and #64 are fixed with newer versions as they are from 2015 and 2017? Couldn't figure out what might be wrong with my RAR out of those posts. Haven't had a lockup since my last post. When I replaced the binary five days ago I also noticed another old instance of rar2fs in /bin/ which I deleted. Could that maybe have been a problem, too? |
Not saying it is the same issue, but it might be relevant for further troubleshooting. The issues listed all had the same common problem with RAR files inside another RAR file. That support was removed long ago and stacking of mounts is replacing that functionality. |
If you would like to push it I would try a complete PMS re-indexing. That would put rar2fs and your system under a major stress test. |
I did that by updating the metadata for three libraries and by analyzing them and readding one test library and creating all thumbnails. So far all corresponding processes work as they should and never exceed 0,5% CPU load. They have their subprocesses, but I figure that is normal with workload like that? |
Yes I would say it is normal and expected behavior. Let's not close this issue yet and allow it to run a few days more. I am a bit surprised though that it would suddenly just work? It is the same version for which you saw the issue, right? |
So it seems to be the rar within rar issue. I found one process by now which seems to be locked up.
|
What version of libunrar do you use? Looks like an exception in thrown but not picked up and then it hangs. Also do you know which archive this is? In that case can you provide me a link to where I might be able to download it. So you use stacked mounts to reach the second level of RAR contents, right? |
And no trace of any stuck processen with 100% load? |
All fine so far :) |
It sounds like the v4 patch could be a candidate to push to master? But I would of course prefer some additional feedback from other users that also have experienced the original sub-process issue before doing so. But the patch does not seem to make it worse at least. @philnse any chance you could give the v4 patch a try? |
patch165v4_b2.patch.txt also works for us in ChromeOS. |
@m-gupta Interesting, since that patch does nothing related to the C++ exception issues I faced. Then I really do not understand what is the issue here since the part of the code that is now changed has been working exactly the same for years. Anyway, please confirm that the clean v4 patch works because the b2 patch is not a candidate to be pushed to master. |
issue165v4.patch.txt also works if that is the clean one. I have no idea why this fixes the problem we saw except that it does. |
Updating metadata (scanning through folders) in plex is much faster with issue165v4.patch.txt for me. This is compared to issue165v3.patch.txt Thanks! |
Chiming in that I've been running issue165v4.patch.txt for a week and it has solved the plex issue for me as well. Thanks! |
I think it would be appropriate now that the patch is merged to master. I have received no indications that the situation has become worse by applying it and the option to revert if it does is still there. |
Under some conditions the reader thread responsible for carrying data from the child process to the I/O buffer may fail to terminate at close and file system thread gets blocked waiting for join to complete. Terminate the reader thread using a control message rather then using the brute force call to pthread_cancel(). Additional updates made in this patch to increase robustness include: - perform dry-run extraction outside of child process and before it is spawned Note that the majority of the changes done as part of this patch only affects extraction of compressed (-m1 and above) and/or encrypted archives. Resolves-issue: #165 Signed-off-by: Hans Beckerus <hans.beckerus at gmail.com>
A patch has now been pushed to master. But I will leave this issue open for a while to hopefully receive some additional feedback that master is now stable before closing this. |
Hi, Thank you for your dedication! |
@milesbenson out of curiosity, would it be possible for you to try out the below patch on top of master/HEAD? |
Checked on a smaller rclone mount, all fine when cache was fully populated - fast scanning. |
Well, good call to leave this soaking for a while then I guess. But we need more input. Can only assume there is a crash in rar2fs and then some sort of signature/trace is needed to follow it up. You did not apply the last patch I posted by any chance? That was only something I wanted to test to see if it had any negative impact since it is supposed to improve the performance when multiple files are being extracted at the same time. If this crash is now a result of the last patch pushed to master then I guess it again needs to be reverted. |
Hi! I've might have done some errors, re-runned the rar2fs/unrar installations, I'll get back to you. /BR |
And for what it is worth, never use 165v3, we already know that is severely broken. Instead use master/HEAD for any testing. |
FWIW I have not had any issues so far on issue165v4.patch.txt - updating to master now and will let that bake to double-check |
@Tattarn any updates here? Still experiencing problems? |
No problem! Everything has been running smooth with master 2+ days. I somehow broke it by using an too old unrar I had lying around. Compiled with 6.0.3 and has been working perfect since! |
Stable on master here as well since updating from also-stable patch v4 |
Thanks all for the feedback/support and patience. I believe we finally reached the point in which this issue can be closed. |
I just ran into the issue that scanning in Plex was slow again. Checked ls -R on the mount and could see same slow behaviour, so cache seems to have dropped. Mounting with rw,allow_other,warmup=10 --seek-length=1 But: No stuck processes so all fine Edit: NVM, just found #170 so it happened due to adding folders. I will run ls -R through cron from time to time |
Indeed. Just for the records: running latest patch since a few months under heavy usage without any issues. Thanks @hasse69 ! |
@hasse69 Thanks for all your hard work, much appreciated! Just curious why this fix (and others) aren't part of a new patch release yet? |
It is a pending task indeed. |
After hours of research regarding my problem I desperatly end up posting here as I just can't figure out how to solve the issue.
I'm using rar2fs with an instance of plex. For the last ten days, rar2fs was running as expected: One instance per mount. After adding two packed episodes overnight I ended up with a 100% CPU load for the mount that is handling the series section. This is an issue I had several times in the past. It seems to be the same issue adressed in in #11
If I remember correctly the suspected reason for it was the plex media server wanting to add files to the mounted but still rar'd files. I figure it has something to do with chapter images that are generated by plex using the transcoder, as no files are saved in the original content folder but the plex media server's library I'm having a hard time to reproduce the error.
Could it be that some kind of cache is piled with data when generating those chapter images and rar2fs crashes? I've attatched an excerpt of my webmin to clarify:
Please let me know if you need any logs. I'm happy to provide them.
//EDIT: I forgot to mention what I tried to solve it by now:
run as root
run as regular user
--seek-length=0, --seek-length=1
--no-smp
Killing exceeding/crashed instances of rar2fs resulting in loosing the mounted content
Regards
My system is running:
Ubuntu 18.04.5
rar2fs v1.29.5-gita393a68 (DLL version 8) Copyright (C) 2009 Hans Beckerus
FUSE library version: 2.9.7
fusermount version: 2.9.7
UNRAR 6.02 beta 1 freeware
Plex Media Server Version 1.23.4.4712
The text was updated successfully, but these errors were encountered: