-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x-pack/filebeat-cloud / TestInput β github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson tests are flaky #34929
Comments
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
Pinging @elastic/security-external-integrations (Team:Security-External Integrations) |
Highlighting the failure and code location:
beats/x-pack/filebeat/input/httpjson/input_test.go Lines 336 to 377 in 6b3d63e
|
This comment was marked as outdated.
This comment was marked as outdated.
Suggestive, but ultimately I think wrong. It looks like it comes down to the behaviour of My suspicion is that due to timing, it may be possible for an early value that is held up in publishing to be over-written by a later split operation. I think this is possible because of the concurrent state that is held in the processing via the
but even then I think that there is a race between obtaining the lock on the first event and the event producer overwriting the shared reference. To fix this an ack lock would be needed to be passed back to or shared withthe event producer, at which point we are simulating non-concurrent execution with goroutines and channels. This hypothesis can be tested by delaying the first event's publication for each request's set of events like so diff --git a/x-pack/filebeat/input/httpjson/request.go b/x-pack/filebeat/input/httpjson/request.go
index 9832275a32..186664f300 100644
--- a/x-pack/filebeat/input/httpjson/request.go
+++ b/x-pack/filebeat/input/httpjson/request.go
@@ -13,6 +13,7 @@ import (
"net/http"
"net/url"
"strings"
+ "time"
"github.com/PaesslerAG/jsonpath"
@@ -511,6 +512,8 @@ func processAndPublishEvents(trCtx *transformContext, events <-chan maybeMsg, pu
continue
}
+ time.Sleep(time.Duration(1-n) * 10 * time.Millisecond)
+
if publish {
event, err := makeEvent(maybeMsg.msg)
if err != nil { Running test now always fails with the same outcome as the flakey test in the report.
The error here is in using an unbuffered channel in place of a yield for a coroutine (they are close but not semantically identical) in conjunction with shared mutable state. |
@andrewkroh , @efd6 can we go ahead skip this flaky test until we have a solution for this issue? |
@narph I'd be reluctant to disable the test; the frequency of the failure is low enough that it only very rarely results in a build failure, and it does protect against regressions the rest of the time. |
It was disabled with intention of fixing during a major refactor. This can be closed. |
Thx for the update |
π Tests Failed
Expand to view the summary
Build stats
Start Time: 2023-03-26T02:40:04.715+0000
Duration: 58 min 45 sec
Test stats π§ͺ
Test errors
Expand to view the tests failures
Extended / x-pack/filebeat-cloud / TestInput/Test_first_event β github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson
Expand to view the error details
Expand to view the stacktrace
Extended / x-pack/filebeat-cloud / TestInput β github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson
Expand to view the error details
Expand to view the stacktrace
Extended / x-pack/metricbeat-cloud / test_query β x-pack.metricbeat.module.sql.query.test_sql_oracle.Test
Expand to view the error details
Expand to view the stacktrace
Steps errors
Expand to view the steps failures
x-pack/filebeat-cloud - mage build test
mage build test
x-pack/metricbeat-cloud - mage build test
mage build test
Error signal
Error "hudson.AbortException: script returned exit code 1"
The text was updated successfully, but these errors were encountered: