Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] IndexingMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges fails #30844

Closed
imotov opened this issue May 24, 2018 · 4 comments
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test-failure Triaged test failures from CI v7.0.0-beta1

Comments

@imotov
Copy link
Contributor

imotov commented May 24, 2018

The failure was in an PR build https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request/10444/consoleText but after discussion with @ywelsch it doesn't seem to be related to the PR where it failed, but intstead might have been caused by #30672.

The failure doesn't reproduce:

  2> REPRODUCE WITH: ./gradlew :server:integTest -Dtests.seed=A12A9F8545E5EC2C -Dtests.class=org.elasticsearch.action.support.master.IndexingMasterFailoverIT -Dtests.method="testMasterFailoverDuringIndexingWithMappingChanges" -Dtests.security.manager=true -Dtests.locale=th-TH-u-nu-thai-x-lvariant-TH -Dtests.timezone=Etc/GMT-1
ERROR   19.4s J2 | IndexingMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: 
   > Expected: <10L>
   >      but: was <0L>
   > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   > 	at org.elasticsearch.action.support.master.IndexingMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges(IndexingMasterFailoverIT.java:143)
   > 	at java.lang.Thread.run(Thread.java:748)Throwable #2: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=5371, name=indexingThread, state=RUNNABLE, group=TGRP-IndexingMasterFailoverIT]
   > Caused by: ElasticsearchTimeoutException[Failed to acknowledge mapping update within [25s]]
   > 	at __randomizedtesting.SeedInfo.seed([A12A9F8545E5EC2C]:0)
   > 	at org.elasticsearch.cluster.action.index.MappingUpdatedAction.updateMappingOnMaster(MappingUpdatedAction.java:90)
   > 	at org.elasticsearch.cluster.action.index.MappingUpdatedAction.updateMappingOnMaster(MappingUpdatedAction.java:80)
   > 	at org.elasticsearch.action.bulk.TransportShardBulkAction$ConcreteMappingUpdatePerformer.updateMappings(TransportShardBulkAction.java:620)
   > 	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeOnPrimaryWhileHandlingMappingUpdates(TransportShardBulkAction.java:591)
   > 	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnPrimary(TransportShardBulkAction.java:566)
   > 	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequest(TransportShardBulkAction.java:142)
   > 	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:248)
   > 	at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:125)
   > 	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:112)
   > 	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:74)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:1018)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:996)
   > 	at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:103)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:357)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:297)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:959)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:956)
   > 	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:266)
   > 	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:233)
   > 	at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2178)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryShardReference(TransportReplicationAction.java:968)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction.access$500(TransportReplicationAction.java:98)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:318)
   > 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:293)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:280)
   > 	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66)
   > 	at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:656)
   > 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724)
   > 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
   > 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   > 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   > 	at java.lang.Thread.run(Thread.java:748)
@imotov imotov added >test-failure Triaged test failures from CI v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels May 24, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

imotov added a commit that referenced this issue May 24, 2018
imotov added a commit that referenced this issue May 24, 2018
@ywelsch
Copy link
Contributor

ywelsch commented May 25, 2018

@bleskes With acking being less lenient now, what happened is that one of the nodes did not ack the mapping update (but was disconnected instead), resulting in the failure of the index request. I suggest we stop checking for isAcknowledged in MappingUpdatedAction for now. WDYT?

@bleskes
Copy link
Contributor

bleskes commented May 25, 2018

@ywelsch +1. My reasoning (I believe your reasoning is the same) is that an ack on putMapping means that the change was committed and it's OK for the primary to go into a waiting loop if the cluster state didn't arrive yet. Any failure to commit will result in a hard exception

@ywelsch
Copy link
Contributor

ywelsch commented May 25, 2018

yes

ywelsch added a commit that referenced this issue Jan 24, 2019
As acking can fail for any reason (unrelated node being too slow, node disconnecting), it should not
be required for acking to succeed in order for index requests with dynamic mapping updates to
successfully complete.

Relates to #30672 and Closes #30844
ywelsch added a commit that referenced this issue Jan 24, 2019
As acking can fail for any reason (unrelated node being too slow, node disconnecting), it should not
be required for acking to succeed in order for index requests with dynamic mapping updates to
successfully complete.

Relates to #30672 and Closes #30844
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test-failure Triaged test failures from CI v7.0.0-beta1
Projects
None yet
Development

No branches or pull requests

5 participants