[CI] IndexingMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges fails #30844

imotov · 2018-05-24T18:26:51Z

The failure was in an PR build https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request/10444/consoleText but after discussion with @ywelsch it doesn't seem to be related to the PR where it failed, but intstead might have been caused by #30672.

The failure doesn't reproduce:

  2> REPRODUCE WITH: ./gradlew :server:integTest -Dtests.seed=A12A9F8545E5EC2C -Dtests.class=org.elasticsearch.action.support.master.IndexingMasterFailoverIT -Dtests.method="testMasterFailoverDuringIndexingWithMappingChanges" -Dtests.security.manager=true -Dtests.locale=th-TH-u-nu-thai-x-lvariant-TH -Dtests.timezone=Etc/GMT-1
ERROR   19.4s J2 | IndexingMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: 
   > Expected: <10L>
   >      but: was <0L>
   > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   > 	at org.elasticsearch.action.support.master.IndexingMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges(IndexingMasterFailoverIT.java:143)
   > 	at java.lang.Thread.run(Thread.java:748)Throwable #2: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=5371, name=indexingThread, state=RUNNABLE, group=TGRP-IndexingMasterFailoverIT]
   > Caused by: ElasticsearchTimeoutException[Failed to acknowledge mapping update within [25s]]
   > 	at __randomizedtesting.SeedInfo.seed([A12A9F8545E5EC2C]:0)
   > 	at org.elasticsearch.cluster.action.index.MappingUpdatedAction.updateMappingOnMaster(MappingUpdatedAction.java:90)
   > 	at org.elasticsearch.cluster.action.index.MappingUpdatedAction.updateMappingOnMaster(MappingUpdatedAction.java:80)
   > 	at org.elasticsearch.action.bulk.TransportShardBulkAction$ConcreteMappingUpdatePerformer.updateMappings(TransportShardBulkAction.java:620)
   > 	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeOnPrimaryWhileHandlingMappingUpdates(TransportShardBulkAction.java:591)
   > 	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnPrimary(TransportShardBulkAction.java:566)
   > 	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequest(TransportShardBulkAction.java:142)
   > 	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:248)
   > 	at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:125)
   > 	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:112)
   > 	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:74)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:1018)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:996)
   > 	at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:103)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:357)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:297)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:959)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:956)
   > 	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:266)
   > 	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:233)
   > 	at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2178)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryShardReference(TransportReplicationAction.java:968)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction.access$500(TransportReplicationAction.java:98)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:318)
   > 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:293)
   > 	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:280)
   > 	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66)
   > 	at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:656)
   > 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724)
   > 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
   > 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   > 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   > 	at java.lang.Thread.run(Thread.java:748)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-05-24T18:26:53Z

Pinging @elastic/es-distributed

…gChanges Tracked by #30844

ywelsch · 2018-05-25T07:22:25Z

@bleskes With acking being less lenient now, what happened is that one of the nodes did not ack the mapping update (but was disconnected instead), resulting in the failure of the index request. I suggest we stop checking for isAcknowledged in MappingUpdatedAction for now. WDYT?

bleskes · 2018-05-25T08:35:35Z

@ywelsch +1. My reasoning (I believe your reasoning is the same) is that an ack on putMapping means that the change was committed and it's OK for the primary to go into a waiting loop if the cluster state didn't arrive yet. Any failure to commit will result in a hard exception

ywelsch · 2018-05-25T11:17:19Z

yes

As acking can fail for any reason (unrelated node being too slow, node disconnecting), it should not be required for acking to succeed in order for index requests with dynamic mapping updates to successfully complete. Relates to #30672 and Closes #30844

imotov added >test-failure Triaged test failures from CI v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels May 24, 2018

imotov added a commit that referenced this issue May 24, 2018

Mute IndexMasterFailoverIT.testMasterFailoverDuringIndexingWithMappin…

3622486

…gChanges Tracked by #30844

imotov added a commit that referenced this issue May 24, 2018

Mute IndexMasterFailoverIT.testMasterFailoverDuringIndexingWithMappin…

d32683e

…gChanges Tracked by #30844

ywelsch mentioned this issue Jun 6, 2018

Set acking timeout to 0 on dynamic mapping update #31140

Merged

ywelsch closed this as completed in #31140 Jan 24, 2019

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] IndexingMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges fails #30844

[CI] IndexingMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges fails #30844

imotov commented May 24, 2018

elasticmachine commented May 24, 2018

ywelsch commented May 25, 2018

bleskes commented May 25, 2018 •

edited

Loading

ywelsch commented May 25, 2018

[CI] IndexingMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges fails #30844

[CI] IndexingMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges fails #30844

Comments

imotov commented May 24, 2018

elasticmachine commented May 24, 2018

ywelsch commented May 25, 2018

bleskes commented May 25, 2018 • edited Loading

ywelsch commented May 25, 2018

bleskes commented May 25, 2018 •

edited

Loading