Integrate better with lucene test framework and mockfilesystems #10656

rmuir · 2015-04-19T00:00:13Z

A few of the changes:

single REPRODUCE WITH: and exception
fail the build if you typo the test name from mvn.
clean up output and stacktraces to work better with IDE integration and easier to debug.
fail tests if they leak file handles.
fail tests if they use an extreme number of file handles (> 2k)
simulate windows behavior randomly (no deletion of open files) for the filesystem.
add extra files to simulate .DS_Store/thumbs.db type behavior.
running tests with -Dtests.verbose=true shows all destructive operations to the filesystem and indexwriter streams.
parallellized rest tests.
more randomization/asserting consistently via lucenetestcase
refactor base test classes (including removing a few) for consistency.
fix some test reproducibility problems.
speed up execution of full test suite (~ 6:45 for full test run versus ~ 9:00 on master for me)
mark slower tests with @Slow annotation, default tests.slow to false.

The last change speeds up junit times to ~ 3:15 for a test run, which gives most bang for the buck: still 73% coverage vs 79% with tests.slow=true, but a good deal faster, and you can always turn on slow tests easily/in jenkins.

Things I want to defer to future issues:

make sure internaltestcluster randomization is less chaotic / more reasonable perf.
use lucenetestcase iwconfig randomization to the extent we can.
try to get to 2 minutes 'mvn test' with good coverage. likely requires adding many unit tests.
try to narrow the coverage gap more between tests.slow=true/false
speed up old backwards index tests (they spend a lot of time merging indexes)
try to design a more safe and efficient integration test base class (the suite-scoped one today has a confusing lifecycle).
hook in more Asserting* classes from lucene to find bugs in query impls etc.
fixing code to work safe with multiple JDK filesystems / allowing them to be used 'for real', anything like that.

Things i investigated and weren't worth the trouble:

parallelizing old backwards index test. not faster.
plugging in ram filesystem. runs on tmpfs are not really faster.

Conflicts: src/main/java/org/elasticsearch/index/translog/Translog.java src/main/java/org/elasticsearch/index/translog/fs/FsTranslog.java

…ad. We should probalby ban .getURI()? Also added a couple nocommits for some issues with tests after mockfs is working again. But I also re-enabled the mockfs suppression in the base test case for now.

with nocommits for tests that need to be investigated.

…ows too

bleskes · 2015-04-19T18:24:17Z

src/main/java/org/elasticsearch/env/NodeEnvironment.java

@@ -128,7 +130,8 @@ public NodeEnvironment(Settings settings, Environment environment) throws IOExce
        int maxLocalStorageNodes = settings.getAsInt("node.max_local_storage_nodes", 50);
        for (int possibleLockId = 0; possibleLockId < maxLocalStorageNodes; possibleLockId++) {
            for (int dirIndex = 0; dirIndex < environment.dataWithClusterFiles().length; dirIndex++) {
-                Path dir = environment.dataWithClusterFiles()[dirIndex].resolve(Paths.get(NODES_FOLDER, Integer.toString(possibleLockId)));
+                // TODO: wtf with resolve(get())
+                Path dir = environment.dataWithClusterFiles()[dirIndex].resolve(PathUtils.get(NODES_FOLDER, Integer.toString(possibleLockId)));


+1 to chain resolves, which seems to be the intention?

lets do it as a followup, we should try to minimize non-test changes here, it was just a TODO since this was using Paths.get for "concatenation" (which is not recommended)

bleskes · 2015-04-19T19:25:57Z

src/test/java/org/elasticsearch/discovery/zen/ping/unicast/UnicastZenPingTests.java

@@ -44,6 +45,7 @@
 /**
 *
 */
+@Slow


I made this test slower due slow connection setup on the vagrant boxes we were using. I think we can speed it up again. I'll do this as a separate change

bleskes · 2015-04-19T19:56:45Z

I like the simplification and removal of test base class. I'm not 100% familiar with the test infra - I think @javanna should have a look as well.

I'm also doubting about the exclusion of slow tests by default. I feels to me that a sanity check run (i.e., run only fast things) is something one should opt in to as opposed to opt out. This is mostly due to the fact that some parts of are system are only tested by slow tests (for example, the discovery runs which add and kill nodes).

Last minor detail : org.elasticsearch.test.ElasticsearchIntegrationTest#afterTestFailed can be removed. It's not used (but also not part of any diff, so I can't comment inline).

rmuir · 2015-04-19T20:00:56Z

I'm also doubting about the exclusion of slow tests by default. I feels to me that a sanity check run (i.e., run only fast things) is something one should opt in to as opposed to opt out. This is mostly due to the fact that some parts of are system are only tested by slow tests (for example, the discovery runs which add and kill nodes).

Well these are unit test runs. Maybe such tests belong in an integration test suite or somewhere else? Or maybe they should be done as unit test, but they are simply too slow.

Again disabling these non-unit-tests this provides the majority of test coverage (73% vs 79%) with a fraction of the time. Its the right tradeoff.

rmuir · 2015-04-19T20:07:40Z

Also, we really need to draw a line in the sand (great time is now) where we don't coddle and encourage slow tests. If we aggressively keep the test suite fast, by default, then people will follow suit, because they want their tests to execute and be fast. Furthermore, people will run the tests more often, because it does not take 15-30 minutes to do so. So it ultimately will result in better coverage than just letting things get slower and slower and less reliable.

s1monw · 2015-04-20T07:32:30Z

Again disabling these non-unit-tests this provides the majority of test coverage (73% vs 79%) with a fraction of the time. Its the right tradeoff.

+1000 this is the right way to go here. We can still configure CI to run all these things but it should be sufficient to make a commit / push if the non-slowish tests pass.

Also, we really need to draw a line in the sand (great time is now) where we don't coddle and encourage slow tests.

this is the way to go - we have to work towards more coverage as you said in the description with the fast tests. That's a reasonable goal and straight forward IMO, yet it will take time but that's fine. This is actually a step forward while it might seem to be a step backwards in terms of coverage but it encourages since tests are faster and have good coverage. you can still set this as your default if you wanna be on the totally safe end. It's the same discussion as using local transport by default for speed it bypasses the entire network stack but it's the right tradeoff. Please lets move forward here and don't loos ourself in endless discussions. if it turns out to be a problem we can still go back.

javanna · 2015-04-20T09:04:05Z

rest-api-spec/test/cluster.put_settings/10_basic.yaml

@@ -1,7 +1,7 @@
 ---
 setup:
  - skip:
-      version: 0 - 999
+      version:  " - "


are we sure this doesn't break the clients builds? @clintongormley WDYT?

@javanna it does but we are on master we can do this!

I don't think " - " is an improvement over "0 - 999". I'd rather see some meaningful value, like all?

Another question is if we want to have a "pending" test like this in the suite at all? Again, if yes, I'd like to have something semantic, like a pending: true flag, not a skip which needs to be deciphered?

@karmi If you want a special all that is doable. I would rather have that then yet another flag like pending.

javanna · 2015-04-20T09:50:29Z

left a couple of comments around REST tests and restarting the suite cluster after a failure. Note that if we go ahead and remove this mechanism there is more code that can be removed (as Boaz pointed out): ElasticsearchIntegrationTest#afterTestFailed, ElasticsearchRestTestCase#afterTestFailed, and I think the rest client creation in RestTestExecutionContext can be simplified (we might be able to make the rest client final).

s1monw · 2015-04-20T11:53:18Z

+1 on removing more stuff if possible as luca said but we should not be blocked by ANY project downstream that use some of our test. We are on master and we can literally do whatever we want as long as there is an upgrade path. Yet, this is test only so we can move forward even if we break client builds.

this change LGTM and I think we should not waste time and letting it go out of date. this is a massive improvement in our infra. +1 to push and take it from here iterating further.

Integrate better with lucene test framework and mockfilesystems

…ranges Due to the change in elastic/elasticsearch@68f75ea, the normalized minimum and maximum versions no longer matched the assumptions about the version definition in the `skip` statement. The condition has been updated to check for empty `min` and `max` (non-normalized) versions. Related: elastic/elasticsearch#10656

clintongormley · 2015-04-21T10:35:43Z

@javanna sorry for the late response, been away. For the record, i'm happy with breaking the clients version parsing - it's a tiny fix.

This was suggested on elastic#10656 as cleaner than " - " to indicate all versions should be skipped.

This was suggested on elastic#10656 as cleaner than " - " to indicate all versions should be skipped. closes elastic#10702

…moved Due to elastic/elasticsearch#10656 Closes #84.

rmuir and others added 30 commits April 15, 2015 18:23

contain filesystem access

6ac4d6d

Merge branch 'master' into mockfilesystem

fb481bc

Conflicts: src/main/java/org/elasticsearch/index/translog/Translog.java src/main/java/org/elasticsearch/index/translog/fs/FsTranslog.java

ensure security manager is always on if it should be

4014526

cutover to lucenetestcase

e5a699f

Fixed test using .getURI() for resource paths to use .getPath() inste…

4d44fa0

…ad. We should probalby ban .getURI()? Also added a couple nocommits for some issues with tests after mockfs is working again. But I also re-enabled the mockfs suppression in the base test case for now.

Enabled mockfs on base test class. All tests pass. Added suppressions

93e591c

with nocommits for tests that need to be investigated.

these leaks are plugged

68267f4

improve REPRODUCE WITH

8ceb495

Merge branch 'master' into mockfilesystem

370819a

note these test seeds for investigation

bac99cc

revert change to use .getPath(), it doens't work on windows

84b20c0

fix many test bugs by minimizing URI handling

65367f5

remove redundant confusing output

007e8f1

Merge branch 'master' into mockfilesystem

89b9f0e

upgrade to lucene 5.2 r1674278

c421948

fix FileSystemUtils failures

2d9e5b4

disable WindowsFS for this test. unsure if its a problem on real wind…

d08322e

…ows too

make stacktraces reasonable

7afa241

fail the build if you typo test name

5718e56

disable extras for test, clean up some stragglers

57b5e06

fix nocommits

b113fbd

suppress all filesystems here due to jimfs brokenness

a985c97

speed up directory wrapping

c7c4045

nobody wants to look at bytecode

e3e4c02

remove tests.processors, this is a reproducibility nightmare

e715535

Merge branch 'master' into mockfilesystem

43b6cd2

fix backwards test to not muck with URIs or get mad about extra clusters

0ff0a00

nuke ElasticsearchSingleNodeLuceneTestCase

84811a5

fold ESTestCase into ElasticsearchTestCase

aa381a2

nuke some unused stuff

61b60da

bleskes reviewed Apr 19, 2015
View reviewed changes

mark just this method as @Slow, can easily take over a minute

2ed711f

bleskes reviewed Apr 19, 2015
View reviewed changes

set heartbeat to 10s

069e11b

javanna reviewed Apr 20, 2015
View reviewed changes

Use dummy TermStatistics when term is not found Closes #10660

a6c154a

rmuir added a commit that referenced this pull request Apr 20, 2015

Merge pull request #10656 from elastic/mockfilesystem

db096b4

Integrate better with lucene test framework and mockfilesystems

rmuir merged commit db096b4 into master Apr 20, 2015

kevinkluge removed the in progress label Apr 20, 2015

rjernst added a commit to rjernst/elasticsearch that referenced this pull request Apr 21, 2015

Tests: Add shortcut "all" to skip version ranges in rest tests

d91fd35

This was suggested on elastic#10656 as cleaner than " - " to indicate all versions should be skipped.

rjernst mentioned this pull request Apr 21, 2015

Tests: Add shortcut "all" to skip version ranges in rest tests #10702

Merged

s1monw deleted the mockfilesystem branch April 21, 2015 20:21

rjernst added a commit to rjernst/elasticsearch that referenced this pull request Apr 22, 2015

Tests: Add shortcut "all" to skip version ranges in rest tests

2d54738

This was suggested on elastic#10656 as cleaner than " - " to indicate all versions should be skipped. closes elastic#10702

dadoonet mentioned this pull request Apr 29, 2015

[test] ElasticsearchIntegrationTest#getContext() shortcut has been removed elastic/elasticsearch-cloud-azure#84

Closed

dadoonet added a commit to elastic/elasticsearch-cloud-azure that referenced this pull request Apr 29, 2015

[test] ElasticsearchIntegrationTest#getContext() shortcut has been re…

ff9394c

…moved Due to elastic/elasticsearch#10656 Closes #84.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate better with lucene test framework and mockfilesystems #10656

Integrate better with lucene test framework and mockfilesystems #10656

rmuir commented Apr 19, 2015

bleskes Apr 19, 2015

rmuir Apr 19, 2015

bleskes Apr 19, 2015

bleskes commented Apr 19, 2015

rmuir commented Apr 19, 2015

rmuir commented Apr 19, 2015

s1monw commented Apr 20, 2015

javanna Apr 20, 2015

s1monw Apr 20, 2015

karmi Apr 21, 2015

karmi Apr 21, 2015

rjernst Apr 21, 2015

javanna commented Apr 20, 2015

s1monw commented Apr 20, 2015

clintongormley commented Apr 21, 2015

@@ @@ -44,6 +45,7 @@ @@
               /**
                *
                */
+              @Slow

Integrate better with lucene test framework and mockfilesystems #10656

Integrate better with lucene test framework and mockfilesystems #10656

Conversation

rmuir commented Apr 19, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes commented Apr 19, 2015

rmuir commented Apr 19, 2015

rmuir commented Apr 19, 2015

s1monw commented Apr 20, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javanna commented Apr 20, 2015

s1monw commented Apr 20, 2015

clintongormley commented Apr 21, 2015