Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky-test: PulsarStateTest.testSinkState - wrong number of messages received #6198

Closed
devinbost opened this issue Feb 3, 2020 · 7 comments · Fixed by #9870
Closed

Flaky-test: PulsarStateTest.testSinkState - wrong number of messages received #6198

devinbost opened this issue Feb 3, 2020 · 7 comments · Fixed by #9870
Labels
type/bug The PR fixed a bug or issue reported a bug

Comments

@devinbost
Copy link
Contributor

devinbost commented Feb 3, 2020

  -     PulsarStateTest.testSinkState
         -     which gave: PulsarStateTest.testSinkState:183 expected [val1-9] but found [val1-8]

To Reproduce
I reproduced this exact issue locally by running the tests in PulsarStateTest while running:
stress-ng --matrix 0 -t 15m
which was running during the entire test process.

Originally mentioned here: #6137

@devinbost devinbost added the type/bug The PR fixed a bug or issue reported a bug label Feb 3, 2020
@devinbost devinbost changed the title Flaky-test: PulsarStateTest.testPythonWordCountFunction Flaky-test: PulsarStateTest.testSinkState Feb 3, 2020
@devinbost
Copy link
Contributor Author

(I was able to install stress-ng on my mac via brew.)

@devinbost
Copy link
Contributor Author

@jiazhai @yjshen Using stress-ng might be helpful for you guys if you're investigating the flaky tests.

@devinbost
Copy link
Contributor Author

The issue is intermittent locally even when running stress-ng.

devinbost pushed a commit to devinbost/pulsar that referenced this issue Feb 3, 2020
@devinbost
Copy link
Contributor Author

devinbost commented Feb 3, 2020

This issue seems to not occur after increasing the timeouts (e.g. doubling the retryCount and initSleepTimeInMillis for retryStrategically) when running them under stress. I pushed this change to my fork to see if it resolves the issue when running from Github CI.

The problem with the current approach to these tests is that there is a race condition between checking the status from the Admin API (which executes a REST call) and the method responsible for producing the messages.
It looks like producer.send blocks on the send operation but not on the receive operation. (Is this right?)
Ideally, we'd have a way to block (at least for a period of time) until the messages are all received instead of needing to poll on the status.
However, such a change may not necessarily fix the test because we'd still be depending on execution to complete successfully after a period of time. (We may not have a choice because the lack of a timeout could cause the test to run indefinitely.)
So, that brings us back to the idea of increasing the timeouts when polling the status to ensure we receive all the messages when using a slow test runner.

@sijie @jiazhai @yjshen Thoughts?

@devinbost
Copy link
Contributor Author

The other issue with blocking on the receive operation is that we'd need receiving to be synchronous with sending, which would impact Pulsar's performance unless there's a way to do so that I'm not considering.

@devinbost
Copy link
Contributor Author

devinbost commented Feb 3, 2020

When looking at the approach used in SimpleProducerConsumerTest.testAsyncProducerAndAsyncAck(..), that approach looks like this:

// Asynchronously produce messages
for (int i = 0; i < 10; i++) {
    final String message = "my-message-" + i;
    Future<MessageId> future = producer.sendAsync(message.getBytes());
    futures.add(future);
}

log.info("Waiting for async publish to complete");
for (Future<MessageId> future : futures) {
    future.get();
}

Message<byte[]> msg = null;
Set<String> messageSet = Sets.newHashSet();
for (int i = 0; i < 10; i++) {
    msg = consumer.receive(5, TimeUnit.SECONDS);
    String receivedMessage = new String(msg.getData());
    log.info("Received message: [{}]", receivedMessage);
    String expectedMessage = "my-message-" + i;
    testMessageOrderAndDuplicates(messageSet, receivedMessage, expectedMessage);
}

// Asynchronously acknowledge upto and including the last message
Future<Void> ackFuture = consumer.acknowledgeCumulativeAsync(msg);
log.info("Waiting for async ack to complete");
ackFuture.get();
consumer.close();

If we wait for the futures to complete like that, does that really guarantee that the publish was fully completed? (i.e. When future.get(); is done blocking, does that guarantee that the message can be received/consumed?) If so, then this approach could be used instead.
However, I am not yet convinced that the loop is fully closed by this approach.
How would we determine if this approach fully closes the loop?

devinbost pushed a commit to devinbost/pulsar that referenced this issue Feb 5, 2020
Increased timeouts for state tests. apache#6200 apache#6198

Increased timeouts to testSimpleConsumerEventsWithoutPartition and introduced await to poll on assertions to eliminate use of Thread.sleep in several places. (apache#6014)

Attempting to fix testPulsarKafkaProducerWithSerializer issue by adding await to test. (apache#6137)

Attempt to fix apache#6207 and add more debugging information by pruning docker containers.

Fixed typo in docker commands for getting debug info. apache#6207.

Removing timeouts as per comments in apache#5333. This is for apache#6202.

Fixed timeout issues for CPP tests. apache#6202 and apache#6137

Increased more timeouts. apache#6202 and apache#6137

Fixed typo in CPP test timeout fix. apache#6202  apache#4884

Edited comment to trigger build apache#6202
devinbost pushed a commit to devinbost/pulsar that referenced this issue Feb 5, 2020
Increased timeouts for state tests. apache#6200 apache#6198

Increased timeouts to testSimpleConsumerEventsWithoutPartition and introduced await to poll on assertions to eliminate use of Thread.sleep in several places. (apache#6014)

Attempting to fix testPulsarKafkaProducerWithSerializer issue by adding await to test. (apache#6137)

Attempt to fix apache#6207 and add more debugging information by pruning docker containers.

Fixed typo in docker commands for getting debug info. apache#6207.

Removing timeouts as per comments in apache#5333. This is for apache#6202.

Fixed timeout issues for CPP tests. apache#6202 and apache#6137

Increased more timeouts. apache#6202 and apache#6137

Fixed typo in CPP test timeout fix. apache#6202  apache#4884

Edited comment to trigger build apache#6202

Rolled back changes to PulsarSpoutTest because fixing some instability broke two of the tests that depend on timeout configurations. Those changes will require more investigation. apache#6202
devinbost pushed a commit to devinbost/pulsar that referenced this issue Feb 7, 2020
Increased timeouts for state tests. apache#6200 apache#6198

Increased timeouts to testSimpleConsumerEventsWithoutPartition and introduced await to poll on assertions to eliminate use of Thread.sleep in several places. (apache#6014)

Attempting to fix testPulsarKafkaProducerWithSerializer issue by adding await to test. (apache#6137)

Attempt to fix apache#6207 and add more debugging information by pruning docker containers.

Fixed typo in docker commands for getting debug info. apache#6207.

Removing timeouts as per comments in apache#5333. This is for apache#6202.

Fixed timeout issues for CPP tests. apache#6202 and apache#6137

Increased more timeouts. apache#6202 and apache#6137

Fixed typo in CPP test timeout fix. apache#6202  apache#4884

Edited comment to trigger build apache#6202

Rolled back changes to PulsarSpoutTest because fixing some instability broke two of the tests that depend on timeout configurations. Those changes will require more investigation. apache#6202

Added timeouts back in places where required. Increased timeouts though. apache#6202

Fixed timeouts for Storm and Kafka tests. Also removed debug block that was accidentially included in ReaderTest. apache#6202

Editing comment to trigger new build. apache#6202

Attempt to workaround test failure. apache#6202

Adding some timeouts back to get beyond hanging tests. apache#6202

Increased sleep value as temporary workaround for thread timeout. apache#6202

Added back timeouts to fix hang but increased timeouts from 1s to 5s. apache#6202

Added back timeout (but made it longer) to prevent hanging test. apache#6202

Fixed formatting since it was breaking the build. apache#6202
devinbost pushed a commit to devinbost/pulsar that referenced this issue Feb 11, 2020
Increased timeouts for state tests. apache#6200 apache#6198

Increased timeouts to testSimpleConsumerEventsWithoutPartition and introduced await to poll on assertions to eliminate use of Thread.sleep in several places. (apache#6014)

Attempting to fix testPulsarKafkaProducerWithSerializer issue by adding await to test. (apache#6137)

Attempt to fix apache#6207 and add more debugging information by pruning docker containers.

Fixed typo in docker commands for getting debug info. apache#6207.

Removing timeouts as per comments in apache#5333. This is for apache#6202.

Fixed timeout issues for CPP tests. apache#6202 and apache#6137

Increased more timeouts. apache#6202 and apache#6137

Fixed typo in CPP test timeout fix. apache#6202  apache#4884

Edited comment to trigger build apache#6202

Rolled back changes to PulsarSpoutTest because fixing some instability broke two of the tests that depend on timeout configurations. Those changes will require more investigation. apache#6202

Added timeouts back in places where required. Increased timeouts though. apache#6202

Fixed timeouts for Storm and Kafka tests. Also removed debug block that was accidentially included in ReaderTest. apache#6202

Editing comment to trigger new build. apache#6202

Attempt to workaround test failure. apache#6202

Adding some timeouts back to get beyond hanging tests. apache#6202

Increased sleep value as temporary workaround for thread timeout. apache#6202

Added back timeouts to fix hang but increased timeouts from 1s to 5s. apache#6202

Added back timeout (but made it longer) to prevent hanging test. apache#6202

Fixed formatting since it was breaking the build. apache#6202

Increased more test timeouts to get them to pass on slow hardware. apache#6202

Increased more test timeouts to get them to pass on slow hardware. apache#6202

Edited more test timeouts to get them to pass on slow hardware. apache#6202

Triggering tests due to 'Could not transfer artifact' maven issue. apache#6202

Increased or edited timeouts to get more tests to pass. apache#6202

Triggering new build by changing comment. apache#6202

Fixed timeouts (to short timeouts) when null message is expected. apache#6202

Triggering new build by changing comment. apache#6202

Increased timeout. apache#6202

Increased sleep as temporary workaround. apache#6202

Tuned timeouts more. apache#6202

Widening time to force timeout in timeout test. apache#6202

Fixed spelling typo. apache#6202

Added randomization of namespace name. apache#6202

Added random name generator to names of producers, subscriptions, and topics in ClientDeduplicationTest to fix duplicate name conflicts. apache#6202

Fixed issues with duplicate namespaces with repeated test runs. apache#6202
devinbost pushed a commit to devinbost/pulsar that referenced this issue Feb 14, 2020
Increased timeouts for state tests. apache#6200 apache#6198

Increased timeouts to testSimpleConsumerEventsWithoutPartition and introduced await to poll on assertions to eliminate use of Thread.sleep in several places. (apache#6014)

Attempting to fix testPulsarKafkaProducerWithSerializer issue by adding await to test. (apache#6137)

Attempt to fix apache#6207 and add more debugging information by pruning docker containers.

Fixed typo in docker commands for getting debug info. apache#6207.

Removing timeouts as per comments in apache#5333. This is for apache#6202.

Fixed timeout issues for CPP tests. apache#6202 and apache#6137

Increased more timeouts. apache#6202 and apache#6137

Fixed typo in CPP test timeout fix. apache#6202  apache#4884

Edited comment to trigger build apache#6202

Rolled back changes to PulsarSpoutTest because fixing some instability broke two of the tests that depend on timeout configurations. Those changes will require more investigation. apache#6202

Added timeouts back in places where required. Increased timeouts though. apache#6202

Fixed timeouts for Storm and Kafka tests. Also removed debug block that was accidentially included in ReaderTest. apache#6202

Editing comment to trigger new build. apache#6202

Attempt to workaround test failure. apache#6202

Adding some timeouts back to get beyond hanging tests. apache#6202

Increased sleep value as temporary workaround for thread timeout. apache#6202

Added back timeouts to fix hang but increased timeouts from 1s to 5s. apache#6202

Added back timeout (but made it longer) to prevent hanging test. apache#6202

Fixed formatting since it was breaking the build. apache#6202

Increased more test timeouts to get them to pass on slow hardware. apache#6202

Increased more test timeouts to get them to pass on slow hardware. apache#6202

Edited more test timeouts to get them to pass on slow hardware. apache#6202

Triggering tests due to 'Could not transfer artifact' maven issue. apache#6202

Increased or edited timeouts to get more tests to pass. apache#6202

Triggering new build by changing comment. apache#6202

Fixed timeouts (to short timeouts) when null message is expected. apache#6202

Triggering new build by changing comment. apache#6202

Increased timeout. apache#6202

Increased sleep as temporary workaround. apache#6202

Tuned timeouts more. apache#6202

Widening time to force timeout in timeout test. apache#6202

Fixed spelling typo. apache#6202

Added randomization of namespace name. apache#6202

Added random name generator to names of producers, subscriptions, and topics in ClientDeduplicationTest to fix duplicate name conflicts. apache#6202

Fixed issues with duplicate namespaces with repeated test runs. apache#6202

Added randomization to topic name to prevent potential conflicts that might be causing non-determinism in test. apache#6202

Added randomization to namespace name to prevent issues with topics not clearing out before second run of tests. apache#6202

Attempt to get C++ test fixed. It's not clear if this commit will build though... apache#6202

Replaced snake_case with camelCase to try to get c++ format to pass the build. apache#6202

Adding random name to subscription to see if that resolves the fact that this test only fails on the second subsequent run. apache#6202

Fixed timeout issues. apache#6202

Attempting fix of testPerTopicStats() by addressing race condition. apache#6202

Adding some debugging to help troubleshoot flaky test. apache#6202

Removing code that wasn't building anyway. apache#6202

Changed how we're testing Prometheus by filtering the topic name to fix race conditions between test runs and sharing broker state. apache#6202

Added more debugging information and fixed assertion apache#6202

Trigger new build apache#6202

Added long timeouts to ensure that broker tests do timeout instead of hanging but without timing out too soon apache#6202

Fixed imports for TimeUnit apache#6202

Fixed imports for TimeUnit apache#6202

Pushing changes to allow discussion on what's happening. apache#6202

Fixed timeouts for the testSharedSingleAckedPartitionedTopic() test. apache#6202
devinbost pushed a commit to devinbost/pulsar that referenced this issue Feb 14, 2020
Increased timeouts for state tests. apache#6200 apache#6198

Increased timeouts to testSimpleConsumerEventsWithoutPartition and introduced await to poll on assertions to eliminate use of Thread.sleep in several places. (apache#6014)

Attempting to fix testPulsarKafkaProducerWithSerializer issue by adding await to test. (apache#6137)

Attempt to fix apache#6207 and add more debugging information by pruning docker containers.

Fixed typo in docker commands for getting debug info. apache#6207.

Removing timeouts as per comments in apache#5333. This is for apache#6202.

Fixed timeout issues for CPP tests. apache#6202 and apache#6137

Increased more timeouts. apache#6202 and apache#6137

Fixed typo in CPP test timeout fix. apache#6202  apache#4884

Edited comment to trigger build apache#6202

Rolled back changes to PulsarSpoutTest because fixing some instability broke two of the tests that depend on timeout configurations. Those changes will require more investigation. apache#6202

Added timeouts back in places where required. Increased timeouts though. apache#6202

Fixed timeouts for Storm and Kafka tests. Also removed debug block that was accidentially included in ReaderTest. apache#6202

Editing comment to trigger new build. apache#6202

Attempt to workaround test failure. apache#6202

Adding some timeouts back to get beyond hanging tests. apache#6202

Increased sleep value as temporary workaround for thread timeout. apache#6202

Added back timeouts to fix hang but increased timeouts from 1s to 5s. apache#6202

Added back timeout (but made it longer) to prevent hanging test. apache#6202

Fixed formatting since it was breaking the build. apache#6202

Increased more test timeouts to get them to pass on slow hardware. apache#6202

Increased more test timeouts to get them to pass on slow hardware. apache#6202

Edited more test timeouts to get them to pass on slow hardware. apache#6202

Triggering tests due to 'Could not transfer artifact' maven issue. apache#6202

Increased or edited timeouts to get more tests to pass. apache#6202

Triggering new build by changing comment. apache#6202

Fixed timeouts (to short timeouts) when null message is expected. apache#6202

Triggering new build by changing comment. apache#6202

Increased timeout. apache#6202

Increased sleep as temporary workaround. apache#6202

Tuned timeouts more. apache#6202

Widening time to force timeout in timeout test. apache#6202

Fixed spelling typo. apache#6202

Added randomization of namespace name. apache#6202

Added random name generator to names of producers, subscriptions, and topics in ClientDeduplicationTest to fix duplicate name conflicts. apache#6202

Fixed issues with duplicate namespaces with repeated test runs. apache#6202

Added randomization to topic name to prevent potential conflicts that might be causing non-determinism in test. apache#6202

Added randomization to namespace name to prevent issues with topics not clearing out before second run of tests. apache#6202

Attempt to get C++ test fixed. It's not clear if this commit will build though... apache#6202

Replaced snake_case with camelCase to try to get c++ format to pass the build. apache#6202

Adding random name to subscription to see if that resolves the fact that this test only fails on the second subsequent run. apache#6202

Fixed timeout issues. apache#6202

Attempting fix of testPerTopicStats() by addressing race condition. apache#6202

Adding some debugging to help troubleshoot flaky test. apache#6202

Removing code that wasn't building anyway. apache#6202

Changed how we're testing Prometheus by filtering the topic name to fix race conditions between test runs and sharing broker state. apache#6202

Added more debugging information and fixed assertion apache#6202

Trigger new build apache#6202

Added long timeouts to ensure that broker tests do timeout instead of hanging but without timing out too soon apache#6202

Fixed imports for TimeUnit apache#6202

Fixed imports for TimeUnit apache#6202

Pushing changes to allow discussion on what's happening. apache#6202

Fixed timeouts for the testSharedSingleAckedPartitionedTopic() test. apache#6202

Fixed issue with Prometheus test. apache#6202

Can't use receive with timeout, if the queue size is 0. Fixed InterceptorsTest. apache#6202

Can't use receive with timeout, if the queue size is 0. apache#6202
@devinbost devinbost changed the title Flaky-test: PulsarStateTest.testSinkState Flaky-test: PulsarStateTest.testSinkState - wrong number of messages received Feb 15, 2020
@devinbost
Copy link
Contributor Author

I fixed this issue in my PR #6202. However, now there's a new issue.
The Github CI output just looks like this:

1211[ERROR] Run 4: PulsarStateTest.testSinkState:162 » PulsarAdmin org.apache.pulsar.shade.javax....

I've attached a log with the test failure.

11_run integration tests.txt

devinbost pushed a commit to devinbost/pulsar that referenced this issue Feb 17, 2020
Increased timeouts for state tests. apache#6200 apache#6198

Increased timeouts to testSimpleConsumerEventsWithoutPartition and introduced await to poll on assertions to eliminate use of Thread.sleep in several places. (apache#6014)

Attempting to fix testPulsarKafkaProducerWithSerializer issue by adding await to test. (apache#6137)

Attempt to fix apache#6207 and add more debugging information by pruning docker containers.

Fixed typo in docker commands for getting debug info. apache#6207.

Removing timeouts as per comments in apache#5333. This is for apache#6202.

Fixed timeout issues for CPP tests. apache#6202 and apache#6137

Increased more timeouts. apache#6202 and apache#6137

Fixed typo in CPP test timeout fix. apache#6202  apache#4884

Edited comment to trigger build apache#6202

Rolled back changes to PulsarSpoutTest because fixing some instability broke two of the tests that depend on timeout configurations. Those changes will require more investigation. apache#6202

Added timeouts back in places where required. Increased timeouts though. apache#6202

Fixed timeouts for Storm and Kafka tests. Also removed debug block that was accidentially included in ReaderTest. apache#6202

Editing comment to trigger new build. apache#6202

Attempt to workaround test failure. apache#6202

Adding some timeouts back to get beyond hanging tests. apache#6202

Increased sleep value as temporary workaround for thread timeout. apache#6202

Added back timeouts to fix hang but increased timeouts from 1s to 5s. apache#6202

Added back timeout (but made it longer) to prevent hanging test. apache#6202

Fixed formatting since it was breaking the build. apache#6202

Increased more test timeouts to get them to pass on slow hardware. apache#6202

Increased more test timeouts to get them to pass on slow hardware. apache#6202

Edited more test timeouts to get them to pass on slow hardware. apache#6202

Triggering tests due to 'Could not transfer artifact' maven issue. apache#6202

Increased or edited timeouts to get more tests to pass. apache#6202

Triggering new build by changing comment. apache#6202

Fixed timeouts (to short timeouts) when null message is expected. apache#6202

Triggering new build by changing comment. apache#6202

Increased timeout. apache#6202

Increased sleep as temporary workaround. apache#6202

Tuned timeouts more. apache#6202

Widening time to force timeout in timeout test. apache#6202

Fixed spelling typo. apache#6202

Added randomization of namespace name. apache#6202

Added random name generator to names of producers, subscriptions, and topics in ClientDeduplicationTest to fix duplicate name conflicts. apache#6202

Fixed issues with duplicate namespaces with repeated test runs. apache#6202

Added randomization to topic name to prevent potential conflicts that might be causing non-determinism in test. apache#6202

Added randomization to namespace name to prevent issues with topics not clearing out before second run of tests. apache#6202

Attempt to get C++ test fixed. It's not clear if this commit will build though... apache#6202

Replaced snake_case with camelCase to try to get c++ format to pass the build. apache#6202

Adding random name to subscription to see if that resolves the fact that this test only fails on the second subsequent run. apache#6202

Fixed timeout issues. apache#6202

Attempting fix of testPerTopicStats() by addressing race condition. apache#6202

Adding some debugging to help troubleshoot flaky test. apache#6202

Removing code that wasn't building anyway. apache#6202

Changed how we're testing Prometheus by filtering the topic name to fix race conditions between test runs and sharing broker state. apache#6202

Added more debugging information and fixed assertion apache#6202

Trigger new build apache#6202

Added long timeouts to ensure that broker tests do timeout instead of hanging but without timing out too soon apache#6202

Fixed imports for TimeUnit apache#6202

Fixed imports for TimeUnit apache#6202

Pushing changes to allow discussion on what's happening. apache#6202

Fixed timeouts for the testSharedSingleAckedPartitionedTopic() test. apache#6202

Fixed issue with Prometheus test. apache#6202

Can't use receive with timeout, if the queue size is 0. Fixed InterceptorsTest. apache#6202

Can't use receive with timeout, if the queue size is 0. apache#6202

Fixed Can't use receive with timeout, if the queue size is 0. apache#6202

Edited comment to trigger re-run of all tests to find more flaky tests. apache#6202

Fixed more of the concurrency issue in testPerTopicStats that was causing namespace conflicts. apache#6202
devinbost pushed a commit to devinbost/pulsar that referenced this issue Feb 22, 2020
Increased timeouts for state tests. apache#6200 apache#6198

Increased timeouts to testSimpleConsumerEventsWithoutPartition and introduced await to poll on assertions to eliminate use of Thread.sleep in several places. (apache#6014)

Attempting to fix testPulsarKafkaProducerWithSerializer issue by adding await to test. (apache#6137)

Attempt to fix apache#6207 and add more debugging information by pruning docker containers.

Fixed typo in docker commands for getting debug info. apache#6207.

Removing timeouts as per comments in apache#5333. This is for apache#6202.

Fixed timeout issues for CPP tests. apache#6202 and apache#6137

Increased more timeouts. apache#6202 and apache#6137

Fixed typo in CPP test timeout fix. apache#6202  apache#4884

Edited comment to trigger build apache#6202

Rolled back changes to PulsarSpoutTest because fixing some instability broke two of the tests that depend on timeout configurations. Those changes will require more investigation. apache#6202

Added timeouts back in places where required. Increased timeouts though. apache#6202

Fixed timeouts for Storm and Kafka tests. Also removed debug block that was accidentially included in ReaderTest. apache#6202

Editing comment to trigger new build. apache#6202

Attempt to workaround test failure. apache#6202

Adding some timeouts back to get beyond hanging tests. apache#6202

Increased sleep value as temporary workaround for thread timeout. apache#6202

Added back timeouts to fix hang but increased timeouts from 1s to 5s. apache#6202

Added back timeout (but made it longer) to prevent hanging test. apache#6202

Fixed formatting since it was breaking the build. apache#6202

Increased more test timeouts to get them to pass on slow hardware. apache#6202

Increased more test timeouts to get them to pass on slow hardware. apache#6202

Edited more test timeouts to get them to pass on slow hardware. apache#6202

Triggering tests due to 'Could not transfer artifact' maven issue. apache#6202

Increased or edited timeouts to get more tests to pass. apache#6202

Triggering new build by changing comment. apache#6202

Fixed timeouts (to short timeouts) when null message is expected. apache#6202

Triggering new build by changing comment. apache#6202

Increased timeout. apache#6202

Increased sleep as temporary workaround. apache#6202

Tuned timeouts more. apache#6202

Widening time to force timeout in timeout test. apache#6202

Fixed spelling typo. apache#6202

Added randomization of namespace name. apache#6202

Added random name generator to names of producers, subscriptions, and topics in ClientDeduplicationTest to fix duplicate name conflicts. apache#6202

Fixed issues with duplicate namespaces with repeated test runs. apache#6202

Added randomization to topic name to prevent potential conflicts that might be causing non-determinism in test. apache#6202

Added randomization to namespace name to prevent issues with topics not clearing out before second run of tests. apache#6202

Attempt to get C++ test fixed. It's not clear if this commit will build though... apache#6202

Replaced snake_case with camelCase to try to get c++ format to pass the build. apache#6202

Adding random name to subscription to see if that resolves the fact that this test only fails on the second subsequent run. apache#6202

Fixed timeout issues. apache#6202

Attempting fix of testPerTopicStats() by addressing race condition. apache#6202

Adding some debugging to help troubleshoot flaky test. apache#6202

Removing code that wasn't building anyway. apache#6202

Changed how we're testing Prometheus by filtering the topic name to fix race conditions between test runs and sharing broker state. apache#6202

Added more debugging information and fixed assertion apache#6202

Trigger new build apache#6202

Added long timeouts to ensure that broker tests do timeout instead of hanging but without timing out too soon apache#6202

Fixed imports for TimeUnit apache#6202

Fixed imports for TimeUnit apache#6202

Pushing changes to allow discussion on what's happening. apache#6202

Fixed timeouts for the testSharedSingleAckedPartitionedTopic() test. apache#6202

Fixed issue with Prometheus test. apache#6202

Can't use receive with timeout, if the queue size is 0. Fixed InterceptorsTest. apache#6202

Can't use receive with timeout, if the queue size is 0. apache#6202

Fixed Can't use receive with timeout, if the queue size is 0. apache#6202

Edited comment to trigger re-run of all tests to find more flaky tests. apache#6202

Fixed more of the concurrency issue in testPerTopicStats that was causing namespace conflicts. apache#6202

Fixed something I missed during rebasing. apache#6202

Fixed issues with Prometheus tests. apache#6256

Changed MessageId.latest to MessageId.earliest to fix apache#6224

Fixes issue apache#6352

Triggering build to inspect test results. apache#6202

Added timeouts to fix hanging tests. apache#6202

Triggering new build. apache#6202

Updating Github workflow to build surefire artifacts if previous step was cancelled, not just failed. apache#6202

Changing CI Unit Action to always build surefire artifacts to help with debugging hanging test. apache#6202

Triggering new build with arbitrary edit. apache#6202

Triggering build with arbitrary change to comment apache#6202

Triggering new build with arbitrary code change. apache#6202

Triggering new build with arbitrary code change. apache#6202
devinbost pushed a commit to devinbost/pulsar that referenced this issue Feb 24, 2020
Increased timeouts for state tests. apache#6200 apache#6198

Increased timeouts to testSimpleConsumerEventsWithoutPartition and introduced await to poll on assertions to eliminate use of Thread.sleep in several places. (apache#6014)

Attempting to fix testPulsarKafkaProducerWithSerializer issue by adding await to test. (apache#6137)

Attempt to fix apache#6207 and add more debugging information by pruning docker containers.

Fixed typo in docker commands for getting debug info. apache#6207.

Removing timeouts as per comments in apache#5333. This is for apache#6202.

Fixed timeout issues for CPP tests. apache#6202 and apache#6137

Increased more timeouts. apache#6202 and apache#6137

Fixed typo in CPP test timeout fix. apache#6202  apache#4884

Edited comment to trigger build apache#6202

Rolled back changes to PulsarSpoutTest because fixing some instability broke two of the tests that depend on timeout configurations. Those changes will require more investigation. apache#6202

Added timeouts back in places where required. Increased timeouts though. apache#6202

Fixed timeouts for Storm and Kafka tests. Also removed debug block that was accidentially included in ReaderTest. apache#6202

Editing comment to trigger new build. apache#6202

Attempt to workaround test failure. apache#6202

Adding some timeouts back to get beyond hanging tests. apache#6202

Increased sleep value as temporary workaround for thread timeout. apache#6202

Added back timeouts to fix hang but increased timeouts from 1s to 5s. apache#6202

Added back timeout (but made it longer) to prevent hanging test. apache#6202

Fixed formatting since it was breaking the build. apache#6202

Increased more test timeouts to get them to pass on slow hardware. apache#6202

Increased more test timeouts to get them to pass on slow hardware. apache#6202

Edited more test timeouts to get them to pass on slow hardware. apache#6202

Triggering tests due to 'Could not transfer artifact' maven issue. apache#6202

Increased or edited timeouts to get more tests to pass. apache#6202

Triggering new build by changing comment. apache#6202

Fixed timeouts (to short timeouts) when null message is expected. apache#6202

Triggering new build by changing comment. apache#6202

Increased timeout. apache#6202

Increased sleep as temporary workaround. apache#6202

Tuned timeouts more. apache#6202

Widening time to force timeout in timeout test. apache#6202

Fixed spelling typo. apache#6202

Added randomization of namespace name. apache#6202

Added random name generator to names of producers, subscriptions, and topics in ClientDeduplicationTest to fix duplicate name conflicts. apache#6202

Fixed issues with duplicate namespaces with repeated test runs. apache#6202

Added randomization to topic name to prevent potential conflicts that might be causing non-determinism in test. apache#6202

Added randomization to namespace name to prevent issues with topics not clearing out before second run of tests. apache#6202

Attempt to get C++ test fixed. It's not clear if this commit will build though... apache#6202

Replaced snake_case with camelCase to try to get c++ format to pass the build. apache#6202

Adding random name to subscription to see if that resolves the fact that this test only fails on the second subsequent run. apache#6202

Fixed timeout issues. apache#6202

Attempting fix of testPerTopicStats() by addressing race condition. apache#6202

Adding some debugging to help troubleshoot flaky test. apache#6202

Removing code that wasn't building anyway. apache#6202

Changed how we're testing Prometheus by filtering the topic name to fix race conditions between test runs and sharing broker state. apache#6202

Added more debugging information and fixed assertion apache#6202

Trigger new build apache#6202

Added long timeouts to ensure that broker tests do timeout instead of hanging but without timing out too soon apache#6202

Fixed imports for TimeUnit apache#6202

Fixed imports for TimeUnit apache#6202

Pushing changes to allow discussion on what's happening. apache#6202

Fixed timeouts for the testSharedSingleAckedPartitionedTopic() test. apache#6202

Fixed issue with Prometheus test. apache#6202

Can't use receive with timeout, if the queue size is 0. Fixed InterceptorsTest. apache#6202

Can't use receive with timeout, if the queue size is 0. apache#6202

Fixed Can't use receive with timeout, if the queue size is 0. apache#6202

Edited comment to trigger re-run of all tests to find more flaky tests. apache#6202

Fixed more of the concurrency issue in testPerTopicStats that was causing namespace conflicts. apache#6202

Fixed something I missed during rebasing. apache#6202

Fixed issues with Prometheus tests. apache#6256

Changed MessageId.latest to MessageId.earliest to fix apache#6224

Fixes issue apache#6352

Triggering build to inspect test results. apache#6202

Added timeouts to fix hanging tests. apache#6202

Triggering new build. apache#6202

Updating Github workflow to build surefire artifacts if previous step was cancelled, not just failed. apache#6202

Changing CI Unit Action to always build surefire artifacts to help with debugging hanging test. apache#6202

Triggering new build with arbitrary edit. apache#6202

Triggering build with arbitrary change to comment apache#6202

Triggering new build with arbitrary code change. apache#6202

Triggering new build with arbitrary code change. apache#6202

Changing surefire trigger back to failure() apache#6202

Added surefire artifacts to run always again. apache#6202

Triggering new build. apache#6202

Added condition to make testPartitions() more robust during repeated runs apache#6202

Implementing Sijie's suggestion about timeout for persistentTopicsCursorResetAfterReset(..) test. apache#6202
devinbost pushed a commit to devinbost/pulsar that referenced this issue Mar 13, 2020
Added awaitility to two pom files.

Increased timeouts for state tests. apache#6200 apache#6198

Increased timeouts to testSimpleConsumerEventsWithoutPartition and introduced await to poll on assertions to eliminate use of Thread.sleep in several places. (apache#6014)

Attempting to fix testPulsarKafkaProducerWithSerializer issue by adding await to test. (apache#6137)

Attempt to fix apache#6207 and add more debugging information by pruning docker containers.

Fixed typo in docker commands for getting debug info. apache#6207.

Removing timeouts as per comments in apache#5333. This is for apache#6202.

Fixed timeout issues for CPP tests. apache#6202 and apache#6137

Increased more timeouts. apache#6202 and apache#6137

Fixed typo in CPP test timeout fix. apache#6202  apache#4884

Edited comment to trigger build apache#6202

Rolled back changes to PulsarSpoutTest because fixing some instability broke two of the tests that depend on timeout configurations. Those changes will require more investigation. apache#6202

Added timeouts back in places where required. Increased timeouts though. apache#6202

Fixed timeouts for Storm and Kafka tests. Also removed debug block that was accidentially included in ReaderTest. apache#6202

Editing comment to trigger new build. apache#6202

Attempt to workaround test failure. apache#6202

Adding some timeouts back to get beyond hanging tests. apache#6202

Increased sleep value as temporary workaround for thread timeout. apache#6202

Added back timeouts to fix hang but increased timeouts from 1s to 5s. apache#6202

Added back timeout (but made it longer) to prevent hanging test. apache#6202

Fixed formatting since it was breaking the build. apache#6202

Increased more test timeouts to get them to pass on slow hardware. apache#6202

Increased more test timeouts to get them to pass on slow hardware. apache#6202

Edited more test timeouts to get them to pass on slow hardware. apache#6202

Triggering tests due to 'Could not transfer artifact' maven issue. apache#6202

Increased or edited timeouts to get more tests to pass. apache#6202

Triggering new build by changing comment. apache#6202

Fixed timeouts (to short timeouts) when null message is expected. apache#6202

Triggering new build by changing comment. apache#6202

Increased timeout. apache#6202

Increased sleep as temporary workaround. apache#6202

Tuned timeouts more. apache#6202

Widening time to force timeout in timeout test. apache#6202

Fixed spelling typo. apache#6202

Added randomization of namespace name. apache#6202

Added random name generator to names of producers, subscriptions, and topics in ClientDeduplicationTest to fix duplicate name conflicts. apache#6202

Fixed issues with duplicate namespaces with repeated test runs. apache#6202

Added randomization to topic name to prevent potential conflicts that might be causing non-determinism in test. apache#6202

Added randomization to namespace name to prevent issues with topics not clearing out before second run of tests. apache#6202

Attempt to get C++ test fixed. It's not clear if this commit will build though... apache#6202

Replaced snake_case with camelCase to try to get c++ format to pass the build. apache#6202

Adding random name to subscription to see if that resolves the fact that this test only fails on the second subsequent run. apache#6202

Fixed timeout issues. apache#6202

Attempting fix of testPerTopicStats() by addressing race condition. apache#6202

Adding some debugging to help troubleshoot flaky test. apache#6202

Removing code that wasn't building anyway. apache#6202

Changed how we're testing Prometheus by filtering the topic name to fix race conditions between test runs and sharing broker state. apache#6202

Added more debugging information and fixed assertion apache#6202

Trigger new build apache#6202

Added long timeouts to ensure that broker tests do timeout instead of hanging but without timing out too soon apache#6202

Fixed imports for TimeUnit apache#6202

Fixed imports for TimeUnit apache#6202

Pushing changes to allow discussion on what's happening. apache#6202

Fixed timeouts for the testSharedSingleAckedPartitionedTopic() test. apache#6202

Fixed issue with Prometheus test. apache#6202

Can't use receive with timeout, if the queue size is 0. Fixed InterceptorsTest. apache#6202

Can't use receive with timeout, if the queue size is 0. apache#6202

Fixed Can't use receive with timeout, if the queue size is 0. apache#6202

Edited comment to trigger re-run of all tests to find more flaky tests. apache#6202

Fixed more of the concurrency issue in testPerTopicStats that was causing namespace conflicts. apache#6202

Fixed something I missed during rebasing. apache#6202

Fixed issues with Prometheus tests. apache#6256

Changed MessageId.latest to MessageId.earliest to fix apache#6224

Fixes issue apache#6352

Triggering build to inspect test results. apache#6202

Added timeouts to fix hanging tests. apache#6202

Triggering new build. apache#6202

Updating Github workflow to build surefire artifacts if previous step was cancelled, not just failed. apache#6202

Changing CI Unit Action to always build surefire artifacts to help with debugging hanging test. apache#6202

Triggering new build with arbitrary edit. apache#6202

Triggering build with arbitrary change to comment apache#6202

Triggering new build with arbitrary code change. apache#6202

Triggering new build with arbitrary code change. apache#6202

Changing surefire trigger back to failure() apache#6202

Added surefire artifacts to run always again. apache#6202

Triggering new build. apache#6202

Added condition to make testPartitions() more robust during repeated runs apache#6202

Implementing Sijie's suggestion about timeout for persistentTopicsCursorResetAfterReset(..) test. apache#6202

Fixed file that I forgot to merge. apache#6202

Increased robustness of testPartitions() for repeated execution. apache#6202

Added more debugging to ParserProxyHandler's channelRead, changed test from private to public, and decreased test noise. apache#6332

Trying to get more debug info apache#6332

Added more debugging log statements to try to pinpoint where the failure happens. apache#6332

Added more debugging log statements to try to pinpoint where the failure happens. apache#6332

Added even more debugging for tracing purposes. apache#6332

Added even more debugging for tracing purposes. apache#6332

Rolling back unnecessary changes. apache#6202

Rolling back unnecessary changes. apache#6202

Fixed issue with testDeadLetterTopic() where redelivery was getting triggered. apache#6202

Adding more debug information and methods to test hypothesis. apache#6332

Adding keepAlive to ServerConnection to see what that does. apache#6332

Increasing ProxyServer keepAliveInterval to 90 seconds in case it is timing out during server tests. apache#6332

Rolling back changes. apache#6332
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
2 participants