Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logfile locking issue on docker host mapped filesystem #55

Open
tristanpye opened this issue Jun 1, 2024 · 0 comments
Open

logfile locking issue on docker host mapped filesystem #55

tristanpye opened this issue Jun 1, 2024 · 0 comments

Comments

@tristanpye
Copy link
Contributor

When logging to across multiple processes on a docker host mounted filesystem, log4cplus fails to properly lock the log file, causing file corruption during file rollovers.

I've investigated just using the raw system calls to lock/unlock a file on demand, and I think I've narrowed it down to a race condition at process startup.

The locking works by creating a xxx.log.lock file alongside the logfiles, and then locking/unlocking that file using fcntl(fd, F_SETLKW). If that log file already exists at process startup, then it is simply opened and all is hunky-dory. If it doesn't exist, then it is created. The problem comes when multiple processes start up at the same time (for some definition of 'same') and try to create the file simultaneously. I'm starting five processes at the same time which create/open the file, and immediately try to lock it. Almost always, two of those processes will apparently successfully lock the file, and the other processes will wait for the lock. One of those apparently successful locks, however, is not locking the same thing as the other processes, and can release/relock at will without affecting any of the other processes. It seems, therefore, that this rogue process is effectively locking a different file. I wondered whether it was creating/opening, and then the other process was overwriting the file with a different one, but the file inodes are the same, indicating something lower level.

In the logging situation, we would effectively have one rogue process not playing nicely with the others and logging without locking (even though it thinks it is).

I'm reasonably confident that this is the whole problem, as it's pretty consistent, and have not managed to get this to happen with volume mounts, or when starting processes individually.

I the best workaround is to use a volume mount, but if you are confident that your processes will not start simultaneously, then the host mount should still work. And this is only on a clean startup with no lock file already present.

Note, this is not a log4cplus problem... it is definitely lower level than that, but there may be ways to work around it. There is no easy way to force log4cplus to use an alternative locking mechanism without patching the source. This is possible, but I fear that all the different ways will boil down to the same thing once it gets to the kernel. Alternatively, we could patch it to first create the lock file, close, and then reopen. In my tests, this always works, but I'm concerned that it might just be narrowing or sliding the race window, and I'm just not hitting the right timing.

This has only been observed (so far) on a MacOS hosted docker engine, so it may be the intermediate VM that docker on MacOS uses that is causing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant