-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FluentBit crash on Windows #2251
Comments
I'm looking at this issue. WFM. |
@fujimotos I don't expect you to be able to repro the crash, but the captured dump should reveal the root cause, unless you don't have access to symbols either? |
@gitfool No, we are building releases on AppVeyor, and haven't So I spent some time this morning and submit #2260. Every Talking about the next step, I also can provide you a build that have |
@fujimotos sure will, thanks! |
@fujimotos Meanwhile, I managed to build a debug build of 1.4.6 with VS Code and captured another crash dump:
I had a quick look around with WinDbg (from my Windows 10 machine):
|
I'm not sure but I think the symbols file is for
I changed the build target from |
Here we go:
Looking around WinDbg:
The evidence indicates that log_read is not returning a log_message struct in this case, since the |
@gitfool I preapred a branch
The testing package is available from the second link; It includes a
That's a good catch. I took it account into f1810e2 and made it sure The branch
Please feel free to tell me if you notice anything. |
... and I just notice you already did get the dump in the #2251 (comment). I'll take a closer look at your log tomorrow and investigate the root |
A couple of questions...
|
@gitfool I pushed a fix to the branch
I added a safety check that prevents log_push() from reading over I hope this helps you.
This is mostly for customary reasons. Fluent Bit is designed to be embeddable into other C programs. That being said, I concur that the Windows users ralely make use I also think that there might be a better form of distrubution
When we started to port Fluent Bit to Windows in 2018, the In fact, both VS2017 and VS2019 can compile Fluent Bit just |
@fujimotos I don't think that's a good change. The read using socket Also, when it does fail, My gut feel is the root cause is due to the reads being split in rare cases. In such a case the |
@gitfool I realized I pushed a wrong commit. I re-updated the
My guess is the same. If read(2) returns only the initial a few bytes The root issue is that the current logic assumes atomic write/read, We really should sort out the logging logic, but this fix should at |
@fujimotos okay, that will avoid the crash, although the pointer deref shouldn't be needed, and wouldn't it be useful to add a call to if (bytes <= 0) {
perror("flb_pipe_r");
return -1;
}
if (msg.size > sizeof(msg.msg)) {
perror("flb_pipe_r msg.size");
return -1;
} Would still be good to fix the root cause though. |
@fujimotos I finally have proof that the it's due to a split read. I was using the following change: if (msg.size > sizeof(msg.msg)) {
DebugBreak();
perror("flb_pipe_r msg.size");
return -1;
} Attached to process with WinDbg and waited until it broke into the debugger:
Then I continued execution at the Looking at standard error that was being logged to file by winsw, I see the following:
The garbage near the end combines perfectly with the buffer text from the debug break:
This is interesting as it also indicates that the first buffer was probably not null terminated. This then points to a problem with the strlen in flb_log_print, which is technically redundant since |
Going to add more debugging to see if i can get more info and/or catch these anomalies upstream. |
@gitfool Thank you for the effort. That's great evidence.
Hmm. I guess that the problem is not the message per se, but the What do you think about the fix f92ebf7? I'm guessing that this can |
Yeah I suspect write_all/read_all will be needed for this to work as expected. Also, it could be smarter and read the size then read size bytes, rather than always reading the full buffer (4KB), but that complicates it compared to always writing/reading the same fixed size buffer, and 4KB is probably small enough to not worry about. |
@gitfool So I updated
I'd appreciate if you can check this out and see if it actually works. |
@fujimotos okay, I've dropped in the artifacts from AppVeyor and am running a test over night. |
@fujimotos I added some debugging to satisfy my curiosity and find out exactly how these changes behaved.
I saw the following combinations of reads that fit 4KB:
This matches nicely with my first crash analysis above where I later commented:
I was also worried about the following code: Lines 400 to 412 in 2dc4484
... but have concluded it works as expected due to the local initializer that zeroes the struct combined with passing a smaller buffer to vsnprintf .
TL;DR: your changes LGTM! 🎉 |
@gitfool Great! Now I'll prepare to push these fixes to mainline. |
Close this issue for #2295 being merged |
Bug Report
Describe the bug
Following up from 56d619f#commitcomment-39848013, I tried running the current latest fluent-bit, version 1.4.6, but it crashed a few times several minutes after starting (as a Windows service with winsw).
Your Environment
fluent-bit.conf
:filters.lua
:Dump file captured with Sysinternals ProcDump using
procdump64.exe -ma -e fluent-bit.exe
:Note: missing symbols for fluent-bit.exe; @fujimotos could we please include PDB symbols with future Windows build artifacts? 🙏
The text was updated successfully, but these errors were encountered: