Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML][TEST] Fix randomly failing HLRC test #40973

Conversation

edsavage
Copy link
Contributor

@edsavage edsavage commented Apr 8, 2019

Made changes to ensure that unique IDs are generated for model snapshots
used by the deleteExpiredDataTest test in the MachineLearningIT suite.

Previously a sleep of 1s was performed between jobs under the assumption
that this would be sufficient to guarantee that the timestamps used in
the composition of the snapshot IDs would be different.

The new approach is to wait on the condition that the old and new
timestamps are in fact different (to 1s resolution).

Made changes to ensure that unique IDs are generated for model snapshots
used by the deleteExpiredDataTest test in the MachineLearningIT suite.

Previously a sleep of 1s was performed between jobs under the assumption
that this would be sufficient to guarantee that the timestamps used in
the composition of the snapshot IDs would be different.

The new approach is to wait on the condition that the old and new
timestamps are in fact different (to 1s resolution).
@edsavage edsavage added >test Issues or PRs that are addressing/adding tests :ml Machine learning v8.0.0 v7.2.0 labels Apr 8, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@droberts195
Copy link
Contributor

I think we might need a similar assertBusy on line 884. At a glance you'd think this place is OK, because we deliberately hack the timestamp on the old model snapshot document to be from a day ago. But because we don't hack the _id the hacked document can get overwritten by the job that's opened on line 885, if this whole sequence of events runs within the same second.

Slight refactor to aid clarity
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edsavage
Copy link
Contributor Author

edsavage commented Apr 9, 2019

Jenkins retest elasticsearch-ci/2

@edsavage
Copy link
Contributor Author

edsavage commented Apr 9, 2019

run elasticsearch-ci/2

@edsavage edsavage merged commit 6a63377 into elastic:master Apr 9, 2019
edsavage added a commit that referenced this pull request Apr 9, 2019
Made changes to ensure that unique IDs are generated for model snapshots
used by the deleteExpiredDataTest test in the MachineLearningIT suite.

Previously a sleep of 1s was performed between jobs under the assumption
that this would be sufficient to guarantee that the timestamps used in
the composition of the snapshot IDs would be different.

The new approach is to wait on the condition that the old and new
timestamps are in fact different (to 1s resolution).
@edsavage edsavage deleted the test-fix/ensure_unique_model_snapshot_timestamps branch April 9, 2019 16:00
@jtibshirani
Copy link
Contributor

@edsavage would it be possible to backport this to 7.0 and 6.7 as well? The same failures have popped up in CI for these branches.

edsavage added a commit that referenced this pull request Apr 10, 2019
Made changes to ensure that unique IDs are generated for model snapshots
used by the deleteExpiredDataTest test in the MachineLearningIT suite.

Previously a sleep of 1s was performed between jobs under the assumption
that this would be sufficient to guarantee that the timestamps used in
the composition of the snapshot IDs would be different.

The new approach is to wait on the condition that the old and new
timestamps are in fact different (to 1s resolution).
edsavage added a commit that referenced this pull request Apr 10, 2019
Made changes to ensure that unique IDs are generated for model snapshots
used by the deleteExpiredDataTest test in the MachineLearningIT suite.

Previously a sleep of 1s was performed between jobs under the assumption
that this would be sufficient to guarantee that the timestamps used in
the composition of the snapshot IDs would be different.

The new approach is to wait on the condition that the old and new
timestamps are in fact different (to 1s resolution).
@cbuescher
Copy link
Member

Failed again on 6.7, would be nice to get the backport(s) in or mute the test there if it takes longer.

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.7+multijob-unix-compatibility/os=oraclelinux-6/98/console

@edsavage
Copy link
Contributor Author

Failed again on 6.7, would be nice to get the backport(s) in or mute the test there if it takes longer.

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.7+multijob-unix-compatibility/os=oraclelinux-6/98/console

Backported to 6.7 and 7.0 branches - if any further failures are seen this test will be muted

gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
Made changes to ensure that unique IDs are generated for model snapshots
used by the deleteExpiredDataTest test in the MachineLearningIT suite.

Previously a sleep of 1s was performed between jobs under the assumption
that this would be sufficient to guarantee that the timestamps used in
the composition of the snapshot IDs would be different.

The new approach is to wait on the condition that the old and new
timestamps are in fact different (to 1s resolution).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >test Issues or PRs that are addressing/adding tests v6.7.2 v7.0.0 v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants