Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[winlog/winlogbeat] Gracefully handle event channel not found errors #34605

Merged
merged 9 commits into from
Feb 23, 2023

Conversation

taylor-swanson
Copy link
Contributor

@taylor-swanson taylor-swanson commented Feb 16, 2023

What does this PR do?

  • Added logic to gracefully handle event channel not found errors. This will only apply to event subscriptions and not reading event files (evtx). If a channel not found error is encountered, either during intial open or during reading, the application will attempt to open a subscription to the event after a short delay.
  • Added Channel and IsFile methods to the EventLog interface.
  • Added IsChannelNotFound function
  • Improved logging through further use of structured logging fields.

Why is it important?

Filebeat/Winlogbeat must be able to resubscribe to a channel if something occurs to the handle. In some cases, the handle may go invalid for some unknown reason, but in these cases, simply closing the handle and resubscribing corrects the issue.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

This is a mildly difficult one to test, although one of the easier ways is to:

  1. Ensure Sysmon is not installed on the system
  2. Add Microsoft-Windows-Sysmon/Operational as an event log source
  3. Start filebeat/winlogbeat, observe "channel not found" errors
  4. Install Sysmon. Within a few seconds, filebeat/winlogbeat should be able to subscribe to the channel

What I wasn't able to test is if a valid channel that filebeat/winlogbeat was subscribed to went away, saw the error, reinstalled Sysmon, and watch it re-subscribe (this was the issue in both related cases).

Related issues

Use cases

Needed in cases where the requested channel isn't available yet, or if the channel randomly disappears (which has been reported by customers).

Logs

filebeat:

{"log.level":"error","@timestamp":"2023-02-21T07:47:38.551-0800","log.logger":"input.winlog","log.origin":{"file.name":"winlog/input.go","file.line":113},"message":"Encountered channel not found error when opening Windows Event Log","service.name":"filebeat","id":"winlog-test","input_source":"winlog-test","eventlog":"winlog-test","channel":"Microsoft-Windows-Sysmon/Operational","error":{"message":"The specified channel could not be found."},"ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-02-21T07:47:43.857-0800","log.logger":"eventlog","log.origin":{"file.name":"eventlog/wineventlog.go","file.line":375},"message":"WinEventLog[winlog-test] using subscription query=<QueryList>\n  <Query Id=\"0\">\n    <Select Path=\"Microsoft-Windows-Sysmon/Operational\">*</Select>\n  </Query>\n</QueryList>","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-02-21T07:47:43.857-0800","log.logger":"input.winlog","log.origin":{"file.name":"winlog/input.go","file.line":119},"message":"Windows Event Log opened successfully","service.name":"filebeat","id":"winlog-test","input_source":"winlog-test","eventlog":"winlog-test","channel":"Microsoft-Windows-Sysmon/Operational","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-02-21T07:47:43.857-0800","log.logger":"eventlog_detail","log.origin":{"file.name":"eventlog/wineventlog.go","file.line":417},"message":"WinEventLog[winlog-test] EventHandles returned 65 handles","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-02-21T07:47:43.857-0800","log.logger":"eventlog","log.origin":{"file.name":"eventlog/cache.go","file.line":86},"message":"messageFilesCache[winlog-test] size=1","service.name":"filebeat","ecs.version":"1.6.0"}

winlogbeat:

{"log.level":"warn","@timestamp":"2023-02-21T08:16:11.411-0800","log.logger":"winlogbeat","log.origin":{"file.name":"beater/eventlogger.go","file.line":143},"message":"Open() encountered channel not found error. Trying again...","service.name":"winlogbeat","id":"Microsoft-Windows-Sysmon/Operational","error":{"message":"The specified channel could not be found."},"channel":"Microsoft-Windows-Sysmon/Operational","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-02-21T08:16:16.419-0800","log.logger":"eventlog","log.origin":{"file.name":"eventlog/wineventlog.go","file.line":387},"message":"WinEventLog[Microsoft-Windows-Sysmon/Operational] using subscription query=<QueryList>\n  <Query Id=\"0\">\n    <Select Path=\"Microsoft-Windows-Sysmon/Operational\">*</Select>\n  </Query>\n</QueryList>","service.name":"winlogbeat","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-02-21T08:16:16.419-0800","log.logger":"winlogbeat","log.origin":{"file.name":"beater/eventlogger.go","file.line":150},"message":"Opened successfully.","service.name":"winlogbeat","id":"Microsoft-Windows-Sysmon/Operational","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-02-21T08:16:16.423-0800","log.logger":"eventlog_detail","log.origin":{"file.name":"eventlog/wineventlog.go","file.line":429},"message":"WinEventLog[Microsoft-Windows-Sysmon/Operational] EventHandles returned 100 handles","service.name":"winlogbeat","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-02-21T08:16:16.423-0800","log.logger":"eventlog","log.origin":{"file.name":"eventlog/cache.go","file.line":86},"message":"messageFilesCache[Microsoft-Windows-Sysmon/Operational] size=1","service.name":"winlogbeat","ecs.version":"1.6.0"}

- Added logic to gracefully handle event channel not found errors. This will
only apply to event subscriptions and not reading event files (evtx). If
a channel not found error is encountered, either during intial open or
during reading, the application will attempt to open a subscription to the
event after a short delay.
- Added Channel and IsFile methods to the EventLog interface.
- Added IsChannelNotFound function
- Improved logging through further use of structured logging fields.
@taylor-swanson taylor-swanson self-assigned this Feb 16, 2023
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Feb 16, 2023
@taylor-swanson taylor-swanson added backport-v8.6.0 Automated backport with mergify backport-v8.7.0 Automated backport with mergify Filebeat Filebeat labels Feb 16, 2023
@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 16, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-02-23T13:55:42.735+0000

  • Duration: 86 min 35 sec

Test stats 🧪

Test Results
Failed 0
Passed 25972
Skipped 1962
Total 27934

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@taylor-swanson taylor-swanson marked this pull request as ready for review February 21, 2023 18:10
@taylor-swanson taylor-swanson requested a review from a team as a code owner February 21, 2023 18:10
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@taylor-swanson
Copy link
Contributor Author

I was never able to test the scenario where we had a valid channel and then it disappeared briefly before coming back. I'm pretty confident, though, that we can handle that situation based on the error handling added in this PR along with what's already there.

In cases where I did try that scenario, either removing the channel hung due to winlogbeat holding a "lock" on the channel, or winlogbeat still saw the channel even though it was gone (took a reboot to clear it up in that case).

Copy link
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Minor queries before merge.

filebeat/input/winlog/input.go Outdated Show resolved Hide resolved
winlogbeat/beater/eventlogger.go Outdated Show resolved Hide resolved
winlogbeat/eventlog/wineventlog.go Show resolved Hide resolved
@mergify
Copy link
Contributor

mergify bot commented Feb 22, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b winlogbeat-channel-resub upstream/winlogbeat-channel-resub
git merge upstream/main
git push upstream winlogbeat-channel-resub

Copy link
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit.

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
@taylor-swanson taylor-swanson merged commit 34a87e5 into elastic:main Feb 23, 2023
@taylor-swanson taylor-swanson deleted the winlogbeat-channel-resub branch February 23, 2023 15:28
mergify bot pushed a commit that referenced this pull request Feb 23, 2023
…34605)

- Added logic to gracefully handle event channel not found errors. This will
only apply to event subscriptions and not reading event files (evtx). If
a channel not found error is encountered, either during intial open or
during reading, the application will attempt to open a subscription to the
event after a short delay.
- Added Channel and IsFile methods to the EventLog interface.
- Added IsChannelNotFound function
- Improved logging through further use of structured logging fields.

(cherry picked from commit 34a87e5)
mergify bot pushed a commit that referenced this pull request Feb 23, 2023
…34605)

- Added logic to gracefully handle event channel not found errors. This will
only apply to event subscriptions and not reading event files (evtx). If
a channel not found error is encountered, either during intial open or
during reading, the application will attempt to open a subscription to the
event after a short delay.
- Added Channel and IsFile methods to the EventLog interface.
- Added IsChannelNotFound function
- Improved logging through further use of structured logging fields.

(cherry picked from commit 34a87e5)

# Conflicts:
#	winlogbeat/eventlog/eventlog.go
#	winlogbeat/eventlog/wineventlog.go
#	winlogbeat/eventlog/wineventlog_experimental.go
taylor-swanson added a commit that referenced this pull request Feb 23, 2023
…34605) (#34656)

- Added logic to gracefully handle event channel not found errors. This will
only apply to event subscriptions and not reading event files (evtx). If
a channel not found error is encountered, either during intial open or
during reading, the application will attempt to open a subscription to the
event after a short delay.
- Added Channel and IsFile methods to the EventLog interface.
- Added IsChannelNotFound function
- Improved logging through further use of structured logging fields.

(cherry picked from commit 34a87e5)

Co-authored-by: Taylor Swanson <[email protected]>
taylor-swanson added a commit that referenced this pull request Feb 23, 2023
…annel not found errors (#34655)

- Added logic to gracefully handle event channel not found errors. This will
only apply to event subscriptions and not reading event files (evtx). If
a channel not found error is encountered, either during intial open or
during reading, the application will attempt to open a subscription to the
event after a short delay.
- Added Channel and IsFile methods to the EventLog interface.
- Added IsChannelNotFound function
- Improved logging through further use of structured logging fields.

(cherry picked from commit 34a87e5)

Co-authored-by: Taylor Swanson <[email protected]>
Co-authored-by: Taylor Swanson <[email protected]>
@nicpenning
Copy link
Contributor

This is awesome, everyone.

I am very excited to test this once it's available. This has been a pain point for us for quite sometime and it required some extra steps (restarting service, order of operations for upgrades with Sysmon and Wlb, etc) when we realized events were not coming in when they should have been.

@taylor-swanson
Copy link
Contributor Author

@Mergifyio backport 7.17

@mergify
Copy link
Contributor

mergify bot commented Mar 21, 2023

backport 7.17

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Mar 21, 2023
…34605)

- Added logic to gracefully handle event channel not found errors. This will
only apply to event subscriptions and not reading event files (evtx). If
a channel not found error is encountered, either during intial open or
during reading, the application will attempt to open a subscription to the
event after a short delay.
- Added Channel and IsFile methods to the EventLog interface.
- Added IsChannelNotFound function
- Improved logging through further use of structured logging fields.

(cherry picked from commit 34a87e5)

# Conflicts:
#	winlogbeat/eventlog/eventlog.go
#	winlogbeat/eventlog/wineventlog.go
#	winlogbeat/eventlog/wineventlog_experimental.go
taylor-swanson added a commit that referenced this pull request Mar 21, 2023
…(backport #34605) (#34869)

- Added logic to gracefully handle event channel not found errors. This will
only apply to event subscriptions and not reading event files (evtx). If
a channel not found error is encountered, either during intial open or
during reading, the application will attempt to open a subscription to the
event after a short delay.
- Added Channel and IsFile methods to the EventLog interface.
- Added IsChannelNotFound function
- Improved logging through further use of structured logging fields.

(cherry picked from commit 34a87e5)
---------

Co-authored-by: Taylor Swanson <[email protected]>
Co-authored-by: Taylor Swanson <[email protected]>
@jonnygoogle25
Copy link

Hey @taylor-swanson,

Appreciate your work on this. I was hoping that the latest version of Winlogbeat 8.7.0 with this fix would correct a problem we see when we upgrade Sysmon but it hasn't. It's the exact scenario you subscribe in your notes but just to reiterate the issue is basically that when we upgrade Sysmon, Winlogbeat events stop flowing to their destination until we restart the service. I've tested this going from either Sysmon14.13->14.16 and from Sysmon14.14->14.16. In looking through Winlogbeat logs I do not ever see a channel not found error, but I do see the following while Sysmon is being upgraded:

{"log.level":"warn","@timestamp":"*","log.origin":{"file.name":"eventlog/wineventlog.go","file.line":470},"message":"WinEventLog[] error salvaging message (event id=5 qualifier=0 provider="Microsoft-Windows-Sysmon" created at ***** will be included without a message): failed in EvtFormatMessage: The publisher has been disabled and its resource is not available. This usually occurs when the publisher is in the process of being uninstalled or upgraded.","service.name":"winlogbeat","ecs.version":"1.6.0"}

Any suggestions or ideas why this might be happening?

@efd6
Copy link
Contributor

efd6 commented Apr 23, 2023

@jonnygoogle25 Does this help?

@taylor-swanson
Copy link
Contributor Author

Hey @jonnygoogle25, it seems to me that when Sysmon is upgraded, our handle to the publisher which renders the XML is no longer valid. If restarting Winlogbeat fixes it, then refreshing our handle the channel seems to be all that needs to be done. The trick here would be reliably detecting when that error occurs. If there's an associated Windows error code that accompanies that error, then it should be pretty trivial to catch it.

That error is being produced in a different part of the code, though. The issue I fixed was in the main read loop and was more so checking for errors when reading from the channel. In the situation you encountered, the channel appears to be intact, but had problems rendering the XML. If we could reliably detect the error, then perhaps I can kick back an error up to that main read loop to cause it to resubscribe to the channel.

@jonnygoogle25
Copy link

That makes sense. I just tested the process again that breaks everything and I'm not able to find anything in the Windows event log that would help out here, the only consistent indicator is that error I shared in my earlier post from the Winlogbeat log. Could you not go on that?

@efd6 - thanks for sharing but I think we're having a slightly different issue. Event viewer shows the descriptions fine after the Sysmon upgrade, it's just that Winlogbeat doesn't pick them up.

@taylor-swanson
Copy link
Contributor Author

We potentially could, but I really dislike matching errors based on strings (not saying we can't but if there's numeric error code that accompanied it, I would just use that instead). Anyways, that'd be an implementation detail to worry about later.

One other question I had, how were you performing the upgrade? When I tried, I had to uninstall the old Sysmon, then install the new. However, the old Sysmon wouldn't uninstall cleanly unless I stopped Winlogbeat (I think maybe because it saw the channel was still used by Winlogbeat).

@jonnygoogle25
Copy link

We're using chocolatey to install over the existing package. I'm not certain if it calls an uninstall first, although I suspect it does

@taylor-swanson
Copy link
Contributor Author

@jonnygoogle25, I wrote up a separate issue to track the publisher disabled scenario: #35316

@GLMONTER
Copy link

When running Winlogbeat as a windows service after this change, the service fails to stop and hangs if a channel that is listed in the config is not found due to a continue. Is this expected behavior?

@taylor-swanson
Copy link
Contributor Author

@GLMONTER, no that was a bug introduced by this change. It was fixed here: #35474. Backports for the fix are targeting both 8.8 and 8.7.

chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
…34605)

- Added logic to gracefully handle event channel not found errors. This will
only apply to event subscriptions and not reading event files (evtx). If
a channel not found error is encountered, either during intial open or
during reading, the application will attempt to open a subscription to the
event after a short delay.
- Added Channel and IsFile methods to the EventLog interface.
- Added IsChannelNotFound function
- Improved logging through further use of structured logging fields.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.6.0 Automated backport with mergify backport-v8.7.0 Automated backport with mergify enhancement Filebeat Filebeat Winlogbeat
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Winlogbeat] Microsoft-Windows-Windows Defender/Operational - The specified channel could not be found.
6 participants