-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[winlog/winlogbeat] Gracefully handle event channel not found errors #34605
[winlog/winlogbeat] Gracefully handle event channel not found errors #34605
Conversation
- Added logic to gracefully handle event channel not found errors. This will only apply to event subscriptions and not reading event files (evtx). If a channel not found error is encountered, either during intial open or during reading, the application will attempt to open a subscription to the event after a short delay. - Added Channel and IsFile methods to the EventLog interface. - Added IsChannelNotFound function - Improved logging through further use of structured logging fields.
Pinging @elastic/security-external-integrations (Team:Security-External Integrations) |
I was never able to test the scenario where we had a valid channel and then it disappeared briefly before coming back. I'm pretty confident, though, that we can handle that situation based on the error handling added in this PR along with what's already there. In cases where I did try that scenario, either removing the channel hung due to winlogbeat holding a "lock" on the channel, or winlogbeat still saw the channel even though it was gone (took a reboot to clear it up in that case). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Minor queries before merge.
This pull request is now in conflicts. Could you fix it? 🙏
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit.
…34605) - Added logic to gracefully handle event channel not found errors. This will only apply to event subscriptions and not reading event files (evtx). If a channel not found error is encountered, either during intial open or during reading, the application will attempt to open a subscription to the event after a short delay. - Added Channel and IsFile methods to the EventLog interface. - Added IsChannelNotFound function - Improved logging through further use of structured logging fields. (cherry picked from commit 34a87e5)
…34605) - Added logic to gracefully handle event channel not found errors. This will only apply to event subscriptions and not reading event files (evtx). If a channel not found error is encountered, either during intial open or during reading, the application will attempt to open a subscription to the event after a short delay. - Added Channel and IsFile methods to the EventLog interface. - Added IsChannelNotFound function - Improved logging through further use of structured logging fields. (cherry picked from commit 34a87e5) # Conflicts: # winlogbeat/eventlog/eventlog.go # winlogbeat/eventlog/wineventlog.go # winlogbeat/eventlog/wineventlog_experimental.go
…34605) (#34656) - Added logic to gracefully handle event channel not found errors. This will only apply to event subscriptions and not reading event files (evtx). If a channel not found error is encountered, either during intial open or during reading, the application will attempt to open a subscription to the event after a short delay. - Added Channel and IsFile methods to the EventLog interface. - Added IsChannelNotFound function - Improved logging through further use of structured logging fields. (cherry picked from commit 34a87e5) Co-authored-by: Taylor Swanson <[email protected]>
…annel not found errors (#34655) - Added logic to gracefully handle event channel not found errors. This will only apply to event subscriptions and not reading event files (evtx). If a channel not found error is encountered, either during intial open or during reading, the application will attempt to open a subscription to the event after a short delay. - Added Channel and IsFile methods to the EventLog interface. - Added IsChannelNotFound function - Improved logging through further use of structured logging fields. (cherry picked from commit 34a87e5) Co-authored-by: Taylor Swanson <[email protected]> Co-authored-by: Taylor Swanson <[email protected]>
This is awesome, everyone. I am very excited to test this once it's available. This has been a pain point for us for quite sometime and it required some extra steps (restarting service, order of operations for upgrades with Sysmon and Wlb, etc) when we realized events were not coming in when they should have been. |
@Mergifyio backport 7.17 |
✅ Backports have been created
|
…34605) - Added logic to gracefully handle event channel not found errors. This will only apply to event subscriptions and not reading event files (evtx). If a channel not found error is encountered, either during intial open or during reading, the application will attempt to open a subscription to the event after a short delay. - Added Channel and IsFile methods to the EventLog interface. - Added IsChannelNotFound function - Improved logging through further use of structured logging fields. (cherry picked from commit 34a87e5) # Conflicts: # winlogbeat/eventlog/eventlog.go # winlogbeat/eventlog/wineventlog.go # winlogbeat/eventlog/wineventlog_experimental.go
…(backport #34605) (#34869) - Added logic to gracefully handle event channel not found errors. This will only apply to event subscriptions and not reading event files (evtx). If a channel not found error is encountered, either during intial open or during reading, the application will attempt to open a subscription to the event after a short delay. - Added Channel and IsFile methods to the EventLog interface. - Added IsChannelNotFound function - Improved logging through further use of structured logging fields. (cherry picked from commit 34a87e5) --------- Co-authored-by: Taylor Swanson <[email protected]> Co-authored-by: Taylor Swanson <[email protected]>
Hey @taylor-swanson, Appreciate your work on this. I was hoping that the latest version of Winlogbeat 8.7.0 with this fix would correct a problem we see when we upgrade Sysmon but it hasn't. It's the exact scenario you subscribe in your notes but just to reiterate the issue is basically that when we upgrade Sysmon, Winlogbeat events stop flowing to their destination until we restart the service. I've tested this going from either Sysmon14.13->14.16 and from Sysmon14.14->14.16. In looking through Winlogbeat logs I do not ever see a channel not found error, but I do see the following while Sysmon is being upgraded: {"log.level":"warn","@timestamp":"*","log.origin":{"file.name":"eventlog/wineventlog.go","file.line":470},"message":"WinEventLog[] error salvaging message (event id=5 qualifier=0 provider="Microsoft-Windows-Sysmon" created at ***** will be included without a message): failed in EvtFormatMessage: The publisher has been disabled and its resource is not available. This usually occurs when the publisher is in the process of being uninstalled or upgraded.","service.name":"winlogbeat","ecs.version":"1.6.0"} Any suggestions or ideas why this might be happening? |
@jonnygoogle25 Does this help? |
Hey @jonnygoogle25, it seems to me that when Sysmon is upgraded, our handle to the publisher which renders the XML is no longer valid. If restarting Winlogbeat fixes it, then refreshing our handle the channel seems to be all that needs to be done. The trick here would be reliably detecting when that error occurs. If there's an associated Windows error code that accompanies that error, then it should be pretty trivial to catch it. That error is being produced in a different part of the code, though. The issue I fixed was in the main read loop and was more so checking for errors when reading from the channel. In the situation you encountered, the channel appears to be intact, but had problems rendering the XML. If we could reliably detect the error, then perhaps I can kick back an error up to that main read loop to cause it to resubscribe to the channel. |
That makes sense. I just tested the process again that breaks everything and I'm not able to find anything in the Windows event log that would help out here, the only consistent indicator is that error I shared in my earlier post from the Winlogbeat log. Could you not go on that? @efd6 - thanks for sharing but I think we're having a slightly different issue. Event viewer shows the descriptions fine after the Sysmon upgrade, it's just that Winlogbeat doesn't pick them up. |
We potentially could, but I really dislike matching errors based on strings (not saying we can't but if there's numeric error code that accompanied it, I would just use that instead). Anyways, that'd be an implementation detail to worry about later. One other question I had, how were you performing the upgrade? When I tried, I had to uninstall the old Sysmon, then install the new. However, the old Sysmon wouldn't uninstall cleanly unless I stopped Winlogbeat (I think maybe because it saw the channel was still used by Winlogbeat). |
We're using chocolatey to install over the existing package. I'm not certain if it calls an uninstall first, although I suspect it does |
@jonnygoogle25, I wrote up a separate issue to track the publisher disabled scenario: #35316 |
When running Winlogbeat as a windows service after this change, the service fails to stop and hangs if a channel that is listed in the config is not found due to a continue. Is this expected behavior? |
…34605) - Added logic to gracefully handle event channel not found errors. This will only apply to event subscriptions and not reading event files (evtx). If a channel not found error is encountered, either during intial open or during reading, the application will attempt to open a subscription to the event after a short delay. - Added Channel and IsFile methods to the EventLog interface. - Added IsChannelNotFound function - Improved logging through further use of structured logging fields.
What does this PR do?
Why is it important?
Filebeat/Winlogbeat must be able to resubscribe to a channel if something occurs to the handle. In some cases, the handle may go invalid for some unknown reason, but in these cases, simply closing the handle and resubscribing corrects the issue.
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files[ ] I have added tests that prove my fix is effective or that my feature worksCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
This is a mildly difficult one to test, although one of the easier ways is to:
Microsoft-Windows-Sysmon/Operational
as an event log sourceWhat I wasn't able to test is if a valid channel that filebeat/winlogbeat was subscribed to went away, saw the error, reinstalled Sysmon, and watch it re-subscribe (this was the issue in both related cases).
Related issues
Use cases
Needed in cases where the requested channel isn't available yet, or if the channel randomly disappears (which has been reported by customers).
Logs
filebeat:
winlogbeat: