-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate better with lucene test framework and mockfilesystems #10656
Conversation
Conflicts: src/main/java/org/elasticsearch/index/translog/Translog.java src/main/java/org/elasticsearch/index/translog/fs/FsTranslog.java
…ad. We should probalby ban .getURI()? Also added a couple nocommits for some issues with tests after mockfs is working again. But I also re-enabled the mockfs suppression in the base test case for now.
with nocommits for tests that need to be investigated.
@@ -128,7 +130,8 @@ public NodeEnvironment(Settings settings, Environment environment) throws IOExce | |||
int maxLocalStorageNodes = settings.getAsInt("node.max_local_storage_nodes", 50); | |||
for (int possibleLockId = 0; possibleLockId < maxLocalStorageNodes; possibleLockId++) { | |||
for (int dirIndex = 0; dirIndex < environment.dataWithClusterFiles().length; dirIndex++) { | |||
Path dir = environment.dataWithClusterFiles()[dirIndex].resolve(Paths.get(NODES_FOLDER, Integer.toString(possibleLockId))); | |||
// TODO: wtf with resolve(get()) | |||
Path dir = environment.dataWithClusterFiles()[dirIndex].resolve(PathUtils.get(NODES_FOLDER, Integer.toString(possibleLockId))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to chain resolves, which seems to be the intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets do it as a followup, we should try to minimize non-test changes here, it was just a TODO since this was using Paths.get for "concatenation" (which is not recommended)
@@ -44,6 +45,7 @@ | |||
/** | |||
* | |||
*/ | |||
@Slow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this test slower due slow connection setup on the vagrant boxes we were using. I think we can speed it up again. I'll do this as a separate change
I like the simplification and removal of test base class. I'm not 100% familiar with the test infra - I think @javanna should have a look as well. I'm also doubting about the exclusion of slow tests by default. I feels to me that a sanity check run (i.e., run only fast things) is something one should opt in to as opposed to opt out. This is mostly due to the fact that some parts of are system are only tested by slow tests (for example, the discovery runs which add and kill nodes). Last minor detail : |
Well these are unit test runs. Maybe such tests belong in an integration test suite or somewhere else? Or maybe they should be done as unit test, but they are simply too slow. Again disabling these non-unit-tests this provides the majority of test coverage (73% vs 79%) with a fraction of the time. Its the right tradeoff. |
Also, we really need to draw a line in the sand (great time is now) where we don't coddle and encourage slow tests. If we aggressively keep the test suite fast, by default, then people will follow suit, because they want their tests to execute and be fast. Furthermore, people will run the tests more often, because it does not take 15-30 minutes to do so. So it ultimately will result in better coverage than just letting things get slower and slower and less reliable. |
+1000 this is the right way to go here. We can still configure CI to run all these things but it should be sufficient to make a commit / push if the non-slowish tests pass.
this is the way to go - we have to work towards more coverage as you said in the description with the fast tests. That's a reasonable goal and straight forward IMO, yet it will take time but that's fine. This is actually a step forward while it might seem to be a step backwards in terms of coverage but it encourages since tests are faster and have good coverage. you can still set this as your default if you wanna be on the totally safe end. It's the same discussion as using local transport by default for speed it bypasses the entire network stack but it's the right tradeoff. Please lets move forward here and don't loos ourself in endless discussions. if it turns out to be a problem we can still go back. |
@@ -1,7 +1,7 @@ | |||
--- | |||
setup: | |||
- skip: | |||
version: 0 - 999 | |||
version: " - " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure this doesn't break the clients builds? @clintongormley WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@javanna it does but we are on master we can do this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think " - "
is an improvement over "0 - 999"
. I'd rather see some meaningful value, like all
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question is if we want to have a "pending" test like this in the suite at all? Again, if yes, I'd like to have something semantic, like a pending: true
flag, not a skip
which needs to be deciphered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karmi If you want a special all
that is doable. I would rather have that then yet another flag like pending
.
left a couple of comments around REST tests and restarting the suite cluster after a failure. Note that if we go ahead and remove this mechanism there is more code that can be removed (as Boaz pointed out): |
+1 on removing more stuff if possible as luca said but we should not be blocked by ANY project downstream that use some of our test. We are on master and we can literally do whatever we want as long as there is an upgrade path. Yet, this is test only so we can move forward even if we break client builds. this change LGTM and I think we should not waste time and letting it go out of date. this is a massive improvement in our infra. +1 to push and take it from here iterating further. |
Integrate better with lucene test framework and mockfilesystems
…ranges Due to the change in elastic/elasticsearch@68f75ea, the normalized minimum and maximum versions no longer matched the assumptions about the version definition in the `skip` statement. The condition has been updated to check for empty `min` and `max` (non-normalized) versions. Related: elastic/elasticsearch#10656
@javanna sorry for the late response, been away. For the record, i'm happy with breaking the clients version parsing - it's a tiny fix. |
This was suggested on elastic#10656 as cleaner than " - " to indicate all versions should be skipped.
This was suggested on elastic#10656 as cleaner than " - " to indicate all versions should be skipped. closes elastic#10702
A few of the changes:
REPRODUCE WITH:
and exception-Dtests.verbose=true
shows all destructive operations to the filesystem and indexwriter streams.@Slow
annotation, defaulttests.slow
to false.The last change speeds up junit times to ~ 3:15 for a test run, which gives most bang for the buck: still 73% coverage vs 79% with tests.slow=true, but a good deal faster, and you can always turn on slow tests easily/in jenkins.
Things I want to defer to future issues:
Asserting*
classes from lucene to find bugs in query impls etc.Things i investigated and weren't worth the trouble: