Ci race detector handle failing tests #19143

frncmx · 2019-02-20T18:47:32Z

The goal is to make all tests stable and passing on Travis with -race flag. ASAP. We might create separate tickets to remove skip/resource restrictions on tests.

Travis has limited resources. We mostly hit memory limitations, but sometimes time limits, too. With testutil.RaceEnabled bool we can detect if we are running with -race and skip resource heavy tests cases, while still keeping the smaller ones to benefit from race detection.

fixes ethersphere/swarm#1226

TestDbStoreCorrect_1k sometimes timed out with -race on Travis. --- FAIL: TestDbStoreCorrect_1k (24.63s) common_test.go:194: testStore failed: timed out after 10s

nodeCount and chunkCount is returned from setupSim and those values we use.

As we will need to use the flag in other packages, too.

Extract long running test cases for better visibility.

As otherwise we always get test failure with `network_test.go:374: context deadline exceeded` even with raised `Timeout`.

As panics on Travis. panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7e351b]

TestDiscoveryPersistenceSimulationSimAdapters failed on Travis with `-race` flag present. The failure was due to extensive memory usage, coming from the CGO runtime. Using a smaller node count resolves the issue. === RUN TestDiscoveryPersistenceSimulationSimAdapter ==7227==ERROR: ThreadSanitizer failed to allocate 0x80000 (524288) bytes of clock allocator (error code: 12) FATAL: ThreadSanitizer CHECK failed: ./gotsan.cc:6976 "((0 && "unable to mmap")) != (0)" (0x0, 0x0) FAIL github.com/ethereum/go-ethereum/swarm/network/simulations/discovery 804.826s

Test on Travis times out with 8 or more nodes if -race flag is present.

Otherwise we get a failure due to extensive memory usage, as the CGO runtime cannot allocate more bytes. === RUN TestFileRetrieval ==7366==ERROR: ThreadSanitizer failed to allocate 0x80000 (524288) bytes of clock allocator (error code: 12) FATAL: ThreadSanitizer CHECK failed: ./gotsan.cc:6976 "((0 && "unable to mmap")) != (0)" (0x0, 0x0) FAIL github.com/ethereum/go-ethereum/swarm/network/stream 155.165s

Otherwise we get a failure due to extensive memory usage, as the CGO runtime cannot allocate more bytes ("ThreadSanitizer failed to allocate").

Test fails a lot with something like: streamer_test.go:1332: Real subscriptions and expected amount don't match; real: 0, expected: 20

Travis just hangs... ok github.com/ethereum/go-ethereum/swarm/storage/feed/lookup 1.307s keepalive keepalive keepalive or panics after a while. Without these tests the race detector job is now stable. Let's invetigate these tests in a separate issue: #1245

* swarm/storage: increase mget timeout in common_test.go TestDbStoreCorrect_1k sometimes timed out with -race on Travis. --- FAIL: TestDbStoreCorrect_1k (24.63s) common_test.go:194: testStore failed: timed out after 10s * swarm: remove unused vars from TestSnapshotSyncWithServer nodeCount and chunkCount is returned from setupSim and those values we use. * swarm: move race/norace helpers from stream to testutil As we will need to use the flag in other packages, too. * swarm: refactor TestSwarmNetwork case Extract long running test cases for better visibility. * swarm/network: skip TestSyncingViaGlobalSync with -race As panics on Travis. panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7e351b] * swarm: run TestSwarmNetwork with fewer nodes with -race As otherwise we always get test failure with `network_test.go:374: context deadline exceeded` even with raised `Timeout`. * swarm/network: run TestDeliveryFromNodes with fewer nodes with -race Test on Travis times out with 8 or more nodes if -race flag is present. * swarm/network: smaller node count for discovery tests with -race TestDiscoveryPersistenceSimulationSimAdapters failed on Travis with `-race` flag present. The failure was due to extensive memory usage, coming from the CGO runtime. Using a smaller node count resolves the issue. === RUN TestDiscoveryPersistenceSimulationSimAdapter ==7227==ERROR: ThreadSanitizer failed to allocate 0x80000 (524288) bytes of clock allocator (error code: 12) FATAL: ThreadSanitizer CHECK failed: ./gotsan.cc:6976 "((0 && "unable to mmap")) != (0)" (0x0, 0x0) FAIL github.com/ethereum/go-ethereum/swarm/network/simulations/discovery 804.826s * swarm/network: run TestFileRetrieval with fewer nodes with -race Otherwise we get a failure due to extensive memory usage, as the CGO runtime cannot allocate more bytes. === RUN TestFileRetrieval ==7366==ERROR: ThreadSanitizer failed to allocate 0x80000 (524288) bytes of clock allocator (error code: 12) FATAL: ThreadSanitizer CHECK failed: ./gotsan.cc:6976 "((0 && "unable to mmap")) != (0)" (0x0, 0x0) FAIL github.com/ethereum/go-ethereum/swarm/network/stream 155.165s * swarm/network: run TestRetrieval with fewer nodes with -race Otherwise we get a failure due to extensive memory usage, as the CGO runtime cannot allocate more bytes ("ThreadSanitizer failed to allocate"). * swarm/network: skip flaky TestGetSubscriptionsRPC on Travis w/ -race Test fails a lot with something like: streamer_test.go:1332: Real subscriptions and expected amount don't match; real: 0, expected: 20 * swarm/storage: skip TestDB_SubscribePull* tests on Travis w/ -race Travis just hangs... ok github.com/ethereum/go-ethereum/swarm/storage/feed/lookup 1.307s keepalive keepalive keepalive or panics after a while. Without these tests the race detector job is now stable. Let's invetigate these tests in a separate issue: ethersphere/swarm#1245

* swarm/storage: increase mget timeout in common_test.go TestDbStoreCorrect_1k sometimes timed out with -race on Travis. --- FAIL: TestDbStoreCorrect_1k (24.63s) common_test.go:194: testStore failed: timed out after 10s * swarm: remove unused vars from TestSnapshotSyncWithServer nodeCount and chunkCount is returned from setupSim and those values we use. * swarm: move race/norace helpers from stream to testutil As we will need to use the flag in other packages, too. * swarm: refactor TestSwarmNetwork case Extract long running test cases for better visibility. * swarm/network: skip TestSyncingViaGlobalSync with -race As panics on Travis. panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7e351b] * swarm: run TestSwarmNetwork with fewer nodes with -race As otherwise we always get test failure with `network_test.go:374: context deadline exceeded` even with raised `Timeout`. * swarm/network: run TestDeliveryFromNodes with fewer nodes with -race Test on Travis times out with 8 or more nodes if -race flag is present. * swarm/network: smaller node count for discovery tests with -race TestDiscoveryPersistenceSimulationSimAdapters failed on Travis with `-race` flag present. The failure was due to extensive memory usage, coming from the CGO runtime. Using a smaller node count resolves the issue. === RUN TestDiscoveryPersistenceSimulationSimAdapter ==7227==ERROR: ThreadSanitizer failed to allocate 0x80000 (524288) bytes of clock allocator (error code: 12) FATAL: ThreadSanitizer CHECK failed: ./gotsan.cc:6976 "((0 && "unable to mmap")) != (0)" (0x0, 0x0) FAIL github.com/ethereum/go-ethereum/swarm/network/simulations/discovery 804.826s * swarm/network: run TestFileRetrieval with fewer nodes with -race Otherwise we get a failure due to extensive memory usage, as the CGO runtime cannot allocate more bytes. === RUN TestFileRetrieval ==7366==ERROR: ThreadSanitizer failed to allocate 0x80000 (524288) bytes of clock allocator (error code: 12) FATAL: ThreadSanitizer CHECK failed: ./gotsan.cc:6976 "((0 && "unable to mmap")) != (0)" (0x0, 0x0) FAIL github.com/ethereum/go-ethereum/swarm/network/stream 155.165s * swarm/network: run TestRetrieval with fewer nodes with -race Otherwise we get a failure due to extensive memory usage, as the CGO runtime cannot allocate more bytes ("ThreadSanitizer failed to allocate"). * swarm/network: skip flaky TestGetSubscriptionsRPC on Travis w/ -race Test fails a lot with something like: streamer_test.go:1332: Real subscriptions and expected amount don't match; real: 0, expected: 20 * swarm/storage: skip TestDB_SubscribePull* tests on Travis w/ -race Travis just hangs... ok github.com/ethereum/go-ethereum/swarm/storage/feed/lookup 1.307s keepalive keepalive keepalive or panics after a while. Without these tests the race detector job is now stable. Let's invetigate these tests in a separate issue: #1245

Ferenc Szabo added 12 commits February 19, 2019 15:23

swarm/storage: increase mget timeout in common_test.go

bc9f5ed

TestDbStoreCorrect_1k sometimes timed out with -race on Travis. --- FAIL: TestDbStoreCorrect_1k (24.63s) common_test.go:194: testStore failed: timed out after 10s

swarm: remove unused vars from TestSnapshotSyncWithServer

d145d77

nodeCount and chunkCount is returned from setupSim and those values we use.

swarm: move race/norace helpers from stream to testutil

f33736d

As we will need to use the flag in other packages, too.

swarm: refactor TestSwarmNetwork case

d9013d8

Extract long running test cases for better visibility.

swarm: run TestSwarmNetwork with fewer nodes with -race

eb9f11e

As otherwise we always get test failure with `network_test.go:374: context deadline exceeded` even with raised `Timeout`.

swarm/network: skip TestSyncingViaGlobalSync with -race

3c48738

As panics on Travis. panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7e351b]

swarm/network: run TestDeliveryFromNodes with fewer nodes with -race

036a351

Test on Travis times out with 8 or more nodes if -race flag is present.

swarm/network: run TestRetrieval with fewer nodes with -race

63d34ba

Otherwise we get a failure due to extensive memory usage, as the CGO runtime cannot allocate more bytes ("ThreadSanitizer failed to allocate").

swarm/network: skip flaky TestGetSubscriptionsRPC on Travis w/ -race

4efcc30

Test fails a lot with something like: streamer_test.go:1332: Real subscriptions and expected amount don't match; real: 0, expected: 20

frncmx added pr:review labels Feb 20, 2019

frncmx requested review from janos, holisticode and zelig February 20, 2019 18:47

frncmx mentioned this pull request Feb 20, 2019

CI race detector handle failing tests ethersphere/swarm#1243

Closed

janos approved these changes Feb 20, 2019

View reviewed changes

frncmx mentioned this pull request Feb 20, 2019

CI race detector for Swarm ethersphere/swarm#1205

Closed

3 tasks

zelig approved these changes Feb 20, 2019

View reviewed changes

zelig merged commit e38b227 into ethereum:master Feb 20, 2019

skylenet added this to the 1.9.0 milestone Feb 21, 2019

frncmx deleted the ci-race-detector-handle-failing-tests branch February 21, 2019 11:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ci race detector handle failing tests #19143

Ci race detector handle failing tests #19143

frncmx commented Feb 20, 2019

Ci race detector handle failing tests #19143

Ci race detector handle failing tests #19143

Conversation

frncmx commented Feb 20, 2019