Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

com.hazelcast.jet.pipeline.SourceBuilder_TopologyChangeTest.test_restartJob_nodeShutDown #2643

Open
olukas opened this issue Nov 6, 2020 · 1 comment

Comments

@olukas
Copy link
Collaborator

olukas commented Nov 6, 2020

master (commit 4ec1e18)

Failed on Zulu JDK 11: http://jenkins.hazelcast.com/job/jet-oss-master-zulu-jdk11/289/testReport/com.hazelcast.jet.pipeline/SourceBuilder_TopologyChangeTest/test_restartJob_nodeShutDown/

Stacktrace:

java.lang.AssertionError: No snapshot produced
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.assertTrue(Assert.java:42)
	at com.hazelcast.jet.core.JetTestSupport.lambda$waitForFirstSnapshot$3(JetTestSupport.java:257)
	at com.hazelcast.test.HazelcastTestSupport.assertTrueEventually(HazelcastTestSupport.java:1253)
	at com.hazelcast.test.HazelcastTestSupport.assertTrueEventually(HazelcastTestSupport.java:1270)
	at com.hazelcast.jet.core.JetTestSupport.waitForFirstSnapshot(JetTestSupport.java:254)
	at com.hazelcast.jet.pipeline.SourceBuilder_TopologyChangeTest.testTopologyChange(SourceBuilder_TopologyChangeTest.java:114)
	at com.hazelcast.jet.pipeline.SourceBuilder_TopologyChangeTest.test_restartJob_nodeShutDown(SourceBuilder_TopologyChangeTest.java:53)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at com.hazelcast.test.FailOnTimeoutStatement$CallableStatement.call(FailOnTimeoutStatement.java:114)
	at com.hazelcast.test.FailOnTimeoutStatement$CallableStatement.call(FailOnTimeoutStatement.java:106)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.lang.Thread.run(Thread.java:834)

Standard output:

d.thread-57] [c.h.j.i.JobCoordinationService]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Not starting jobs because master is not in safe state.
2020-11-06 03:24:02,237 [DEBUG] [hz.silly_cannon.cached.thread-51] [c.h.j.i.JobRepository]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Cleared snapshot data map __jet.snapshot.053b-2414-3200-0001.0
2020-11-06 03:24:02,238 [ERROR] [hz.silly_cannon.cached.thread-51] [c.h.j.i.o.SnapshotPhase2Operation]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] job 053b-2414-3200-0001, execution 053b-2414-3201-0001 not found for coordinator [127.0.0.1]:5701 for 'SnapshotPhase2Operation'
com.hazelcast.jet.core.TopologyChangedException: job 053b-2414-3200-0001, execution 053b-2414-3201-0001 not found for coordinator [127.0.0.1]:5701 for 'SnapshotPhase2Operation'
	at com.hazelcast.jet.impl.JobExecutionService.assertExecutionContext(JobExecutionService.java:326) ~[classes/:?]
	at com.hazelcast.jet.impl.operation.SnapshotPhase2Operation.doRun(SnapshotPhase2Operation.java:50) ~[classes/:?]
	at com.hazelcast.jet.impl.operation.AsyncOperation.run(AsyncOperation.java:53) ~[classes/:?]
	at com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:184) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:256) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:237) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:213) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl.run(OperationExecutorImpl.java:411) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl.runOrExecute(OperationExecutorImpl.java:438) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvokeLocal(Invocation.java:603) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvoke(Invocation.java:588) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke0(Invocation.java:547) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke(Invocation.java:244) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.InvocationBuilderImpl.invoke(InvocationBuilderImpl.java:59) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.jet.impl.MasterContext.invokeOnParticipant(MasterContext.java:261) ~[classes/:?]
	at com.hazelcast.jet.impl.MasterContext.invokeOnParticipants(MasterContext.java:245) ~[classes/:?]
	at com.hazelcast.jet.impl.MasterSnapshotContext.lambda$onSnapshotPhase1Complete$5(MasterSnapshotContext.java:269) ~[classes/:?]
	at com.hazelcast.jet.impl.JobCoordinationService.lambda$submitToCoordinatorThread$46(JobCoordinationService.java:1028) ~[classes/:?]
	at com.hazelcast.jet.impl.JobCoordinationService.lambda$submitToCoordinatorThread$47(JobCoordinationService.java:1049) ~[classes/:?]
	at com.hazelcast.internal.util.executor.CompletableFutureTask.run(CompletableFutureTask.java:64) [hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:217) [hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]
	at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76) [hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102) [hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
2020-11-06 03:24:02,238 [ WARN] [hz.silly_cannon.cached.thread-51] [c.h.j.i.MasterSnapshotContext]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] SnapshotPhase2Operation for snapshot 1 in job '053b-2414-3200-0001', execution 053b-2414-3201-0001 failed on member: MemberInfo{address=[127.0.0.1]:5701, uuid=f532e641-eea6-44c0-896a-1b3bf5244f37, liteMember=false, memberListJoinVersion=1}=com.hazelcast.jet.core.TopologyChangedException: job 053b-2414-3200-0001, execution 053b-2414-3201-0001 not found for coordinator [127.0.0.1]:5701 for 'SnapshotPhase2Operation'
com.hazelcast.jet.core.TopologyChangedException: job 053b-2414-3200-0001, execution 053b-2414-3201-0001 not found for coordinator [127.0.0.1]:5701 for 'SnapshotPhase2Operation'
	at com.hazelcast.jet.impl.JobExecutionService.assertExecutionContext(JobExecutionService.java:326) ~[classes/:?]
	at com.hazelcast.jet.impl.operation.SnapshotPhase2Operation.doRun(SnapshotPhase2Operation.java:50) ~[classes/:?]
	at com.hazelcast.jet.impl.operation.AsyncOperation.run(AsyncOperation.java:53) ~[classes/:?]
	at com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:184) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:256) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:237) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:213) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl.run(OperationExecutorImpl.java:411) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl.runOrExecute(OperationExecutorImpl.java:438) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvokeLocal(Invocation.java:603) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvoke(Invocation.java:588) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke0(Invocation.java:547) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke(Invocation.java:244) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.InvocationBuilderImpl.invoke(InvocationBuilderImpl.java:59) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.jet.impl.MasterContext.invokeOnParticipant(MasterContext.java:261) ~[classes/:?]
	at com.hazelcast.jet.impl.MasterContext.invokeOnParticipants(MasterContext.java:245) ~[classes/:?]
	at com.hazelcast.jet.impl.MasterSnapshotContext.lambda$onSnapshotPhase1Complete$5(MasterSnapshotContext.java:269) ~[classes/:?]
	at com.hazelcast.jet.impl.JobCoordinationService.lambda$submitToCoordinatorThread$46(JobCoordinationService.java:1028) ~[classes/:?]
	at com.hazelcast.jet.impl.JobCoordinationService.lambda$submitToCoordinatorThread$47(JobCoordinationService.java:1049) ~[classes/:?]
	at com.hazelcast.internal.util.executor.CompletableFutureTask.run(CompletableFutureTask.java:64) [hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:217) [hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]
	at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76) [hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102) [hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
2020-11-06 03:24:02,238 [ WARN] [hz.silly_cannon.cached.thread-51] [c.h.j.i.MasterSnapshotContext]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] SnapshotPhase2Operation for snapshot 1 in job '053b-2414-3200-0001', execution 053b-2414-3201-0001 failed on member: MemberInfo{address=[127.0.0.1]:5702, uuid=54517725-aadf-4ea1-a655-dd89f54702f8, liteMember=false, memberListJoinVersion=2}=com.hazelcast.spi.exception.TargetNotMemberException: Not Member! target: [127.0.0.1]:5702, partitionId: -1, operation: com.hazelcast.jet.impl.operation.SnapshotPhase2Operation, service: hz:impl:jetService
com.hazelcast.spi.exception.TargetNotMemberException: Not Member! target: [127.0.0.1]:5702, partitionId: -1, operation: com.hazelcast.jet.impl.operation.SnapshotPhase2Operation, service: hz:impl:jetService
	at com.hazelcast.spi.impl.operationservice.impl.Invocation.initInvocationTarget(Invocation.java:295) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvoke(Invocation.java:569) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke0(Invocation.java:547) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke(Invocation.java:244) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.spi.impl.operationservice.impl.InvocationBuilderImpl.invoke(InvocationBuilderImpl.java:59) ~[hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.jet.impl.MasterContext.invokeOnParticipant(MasterContext.java:261) ~[classes/:?]
	at com.hazelcast.jet.impl.MasterContext.invokeOnParticipants(MasterContext.java:245) ~[classes/:?]
	at com.hazelcast.jet.impl.MasterSnapshotContext.lambda$onSnapshotPhase1Complete$5(MasterSnapshotContext.java:269) ~[classes/:?]
	at com.hazelcast.jet.impl.JobCoordinationService.lambda$submitToCoordinatorThread$46(JobCoordinationService.java:1028) ~[classes/:?]
	at com.hazelcast.jet.impl.JobCoordinationService.lambda$submitToCoordinatorThread$47(JobCoordinationService.java:1049) ~[classes/:?]
	at com.hazelcast.internal.util.executor.CompletableFutureTask.run(CompletableFutureTask.java:64) [hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:217) [hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]
	at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76) [hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
	at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102) [hazelcast-4.1.1-SNAPSHOT.jar:4.1.1-SNAPSHOT]
2020-11-06 03:24:02,239 [DEBUG] [hz.silly_cannon.cached.thread-51] [c.h.j.i.JobCoordinationService]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] job '053b-2414-3200-0001', execution 053b-2414-3201-0001 snapshot is scheduled in 500ms
2020-11-06 03:24:02,239 [DEBUG] [hz.silly_cannon.cached.thread-51] [c.h.j.i.MasterSnapshotContext]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Snapshot 1 for job '053b-2414-3200-0001', execution 053b-2414-3201-0001 completed in 101ms, status=failure: com.hazelcast.jet.core.TopologyChangedException: job 053b-2414-3200-0001, execution 053b-2414-3201-0001 not found for coordinator [127.0.0.1]:5701 for 'SnapshotPhase1Operation'
2020-11-06 03:24:02,239 [DEBUG] [hz.silly_cannon.cached.thread-51] [c.h.j.i.MasterSnapshotContext]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Not beginning snapshot, job '053b-2414-3200-0001', execution 053b-2414-3201-0001 is not RUNNING, but NOT_RUNNING
2020-11-06 03:24:02,244 [ WARN] [hz.silly_cannon.migration] [c.h.i.p.i.MigrationPlanner]: Logging took 512 ms.
2020-11-06 03:24:02,245 [ INFO] [Thread-478] [c.h.i.i.NodeExtension]: [127.0.0.1]:5702 [jet] [4.4-SNAPSHOT] Destroying node NodeExtension.
2020-11-06 03:24:02,245 [ INFO] [Thread-478] [c.h.i.i.Node]: [127.0.0.1]:5702 [jet] [4.4-SNAPSHOT] Hazelcast Shutdown is completed in 1802 ms.
2020-11-06 03:24:02,246 [ INFO] [Thread-478] [c.h.c.LifecycleService]: [127.0.0.1]:5702 [jet] [4.4-SNAPSHOT] [127.0.0.1]:5702 is SHUTDOWN
2020-11-06 03:24:02,246 [ INFO] [Thread-478] [c.h.c.LifecycleService]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] [127.0.0.1]:5701 is SHUTTING_DOWN
2020-11-06 03:24:02,246 [ WARN] [Thread-478] [c.h.i.i.Node]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Terminating forcefully...
2020-11-06 03:24:02,246 [ INFO] [Thread-478] [c.h.i.i.Node]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Shutting down connection manager...
2020-11-06 03:24:02,259 [ INFO] [hz.silly_cannon.migration] [c.h.i.p.i.MigrationManager]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Partition balance is ok, no need to repartition.
2020-11-06 03:24:02,313 [ INFO] [Thread-478] [c.h.i.i.Node]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Shutting down node engine...
2020-11-06 03:24:02,418 [ INFO] [Thread-478] [c.h.i.i.NodeExtension]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Destroying node NodeExtension.
2020-11-06 03:24:02,418 [ INFO] [Thread-478] [c.h.i.i.Node]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Hazelcast Shutdown is completed in 172 ms.
2020-11-06 03:24:02,419 [ INFO] [Thread-478] [c.h.c.LifecycleService]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] [127.0.0.1]:5701 is SHUTDOWN
BuildInfo right after test_restartJob_nodeShutDown(com.hazelcast.jet.pipeline.SourceBuilder_TopologyChangeTest): BuildInfo{version='4.1.1-SNAPSHOT', build='20201105', buildNumber=20201105, revision=51b6a4b, enterprise=false, serializationVersion=1, jet=JetBuildInfo{version='4.4-SNAPSHOT', build='20201106', revision='4ec1e18'}}
Hiccups measured while running test 'test_restartJob_nodeShutDown(com.hazelcast.jet.pipeline.SourceBuilder_TopologyChangeTest):'
03:23:40, accumulated pauses: 1280 ms, max pause: 264 ms, pauses over 1000 ms: 0
03:23:45, accumulated pauses: 1742 ms, max pause: 729 ms, pauses over 1000 ms: 0
03:23:50, accumulated pauses: 850 ms, max pause: 18 ms, pauses over 1000 ms: 0
03:23:55, accumulated pauses: 1786 ms, max pause: 838 ms, pauses over 1000 ms: 0
03:24:00, accumulated pauses: 994 ms, max pause: 503 ms, pauses over 1000 ms: 0
@olukas olukas added this to the 4.4 milestone Nov 6, 2020
@gurbuzali gurbuzali self-assigned this Nov 16, 2020
@gurbuzali
Copy link

The test failed on the very first assertion without even restarting any nodes. after starting the job, test waits for the first results to arrive to the list and then checks if any snapshots produced. From the logs we can say that snapshot started

SNAPSHOT] Starting snapshot 0 phase 1 for job '053b-2414-3200-0001', execution 053b-2414-3201-0001 on member
Will save 321 to snapshot
2020-11-06 03:23:48,749 [DEBUG] [hz.silly_cannon.jet.cooperative.thread-0] [c.h.j.i.u.AsyncSnapshotWriterImpl]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Stats for src: keys=1, chunks=1, bytes=170
2020-11-06 03:23:48,754 [DEBUG] [hz.silly_cannon.jet.cooperative.thread-1] [c.h.j.i.u.AsyncSnapshotWriterImpl]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Stats for sliding-window-prepare: keys=0, chunks=0, bytes=0

The snapshot interval is 500ms but there is no other snapshot related logs for 10 seconds (the timeout for waitForFirstSnapshot call) and then test fails. Looks like the first snapshot (0) stuck at pahse1.

The pipeline has peek just before sink so we should have lots of Output to ordinal 0 in the logs but we see only 2 lines of logs for the timeout duration (10 seconds).

I'm putting the relevant logs below.

2020-11-06 03:23:46,869 [ INFO] [hz.silly_cannon.cached.thread-13] [c.h.j.i.JobCoordinationService]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Starting job 053b-2414-3200-0001 based on submit request
2020-11-06 03:23:46,903 [ INFO] [hz.silly_cannon.cached.thread-13] [c.h.j.i.MasterJobContext]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Didn't find any snapshot to restore for job '053b-2414-3200-0001', execution 053b-2414-3201-0001
2020-11-06 03:23:46,903 [ INFO] [hz.silly_cannon.cached.thread-13] [c.h.j.i.MasterJobContext]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Start executing job '053b-2414-3200-0001', execution 053b-2414-3201-0001, execution graph in DOT format:
digraph DAG {
	"src" [localParallelism=1];
	"sliding-window-prepare" [localParallelism=72];
	"sliding-window" [localParallelism=1];
	"listSink(result-1acb9824-1220-42b4-bbee-1c6755d0581c)" [localParallelism=1];
	"src" -> "sliding-window-prepare" [queueSize=1024];
	subgraph cluster_0 {
		"sliding-window-prepare" -> "sliding-window" [label="distributed-partitioned", queueSize=1024];
	}
	"sliding-window" -> "listSink(result-1acb9824-1220-42b4-bbee-1c6755d0581c)" [queueSize=1024];
}
HINT: You can use graphviz or http://viz-js.com to visualize the printed graph.
2020-11-06 03:23:46,903 [DEBUG] [hz.silly_cannon.cached.thread-13] [c.h.j.i.MasterJobContext]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Building execution plan for job '053b-2414-3200-0001', execution 053b-2414-3201-0001
2020-11-06 03:23:46,914 [DEBUG] [hz.silly_cannon.cached.thread-13] [c.h.j.i.MasterJobContext]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Built execution plans for job '053b-2414-3200-0001', execution 053b-2414-3201-0001
2020-11-06 03:23:46,917 [DEBUG] [hz.silly_cannon.cached.thread-13] [c.h.j.i.o.InitExecutionOperation]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Initializing execution plan for job 053b-2414-3200-0001, execution 053b-2414-3201-0001 from [127.0.0.1]:5701
2020-11-06 03:23:46,951 [DEBUG] [hz.silly_cannon.cached.thread-14] [c.h.j.i.JobRepository]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Job cleanup took 37ms
2020-11-06 03:23:46,978 [ INFO] [hz.silly_cannon.cached.thread-13] [c.h.j.i.JobExecutionService]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Execution plan for jobId=053b-2414-3200-0001, jobName='053b-2414-3200-0001', executionId=053b-2414-3201-0001 initialized
2020-11-06 03:23:46,996 [DEBUG] [hz.elastic_cannon.generic-operation.thread-15] [c.h.j.i.o.InitExecutionOperation]: [127.0.0.1]:5702 [jet] [4.4-SNAPSHOT] Initializing execution plan for job 053b-2414-3200-0001, execution 053b-2414-3201-0001 from [127.0.0.1]:5701
2020-11-06 03:23:47,154 [ INFO] [hz.elastic_cannon.generic-operation.thread-15] [c.h.j.i.JobExecutionService]: [127.0.0.1]:5702 [jet] [4.4-SNAPSHOT] Execution plan for jobId=053b-2414-3200-0001, jobName='053b-2414-3200-0001', executionId=053b-2414-3201-0001 initialized
2020-11-06 03:23:47,155 [DEBUG] [hz.silly_cannon.cached.thread-13] [c.h.j.i.MasterJobContext]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Init of job '053b-2414-3200-0001', execution 053b-2414-3201-0001 was successful
2020-11-06 03:23:47,155 [DEBUG] [hz.silly_cannon.cached.thread-13] [c.h.j.i.MasterJobContext]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Executing job '053b-2414-3200-0001', execution 053b-2414-3201-0001
2020-11-06 03:23:47,155 [ INFO] [hz.silly_cannon.cached.thread-13] [c.h.j.i.JobExecutionService]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Start execution of job '053b-2414-3200-0001', execution 053b-2414-3201-0001 from coordinator [127.0.0.1]:5701
2020-11-06 03:23:47,662 [DEBUG] [hz.silly_cannon.cached.thread-59] [c.h.j.i.MasterJobContext]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Not scaling up job '053b-2414-3200-0001', execution 053b-2414-3201-0001: not running or already running on all members
2020-11-06 03:23:47,723 [DEBUG] [hz.silly_cannon.cached.thread-13] [c.h.j.i.JobCoordinationService]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] job '053b-2414-3200-0001', execution 053b-2414-3201-0001 snapshot is scheduled in 500ms
2020-11-06 03:23:47,726 [ INFO] [hz.elastic_cannon.generic-operation.thread-16] [c.h.j.i.JobExecutionService]: [127.0.0.1]:5702 [jet] [4.4-SNAPSHOT] Start execution of job '053b-2414-3200-0001', execution 053b-2414-3201-0001 from coordinator [127.0.0.1]:5701
2020-11-06 03:23:48,276 [DEBUG] [hz.silly_cannon.cached.thread-54] [c.h.j.i.MasterSnapshotContext]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Starting snapshot 0 for job '053b-2414-3200-0001', execution 053b-2414-3201-0001, flags: terminal=no,export=no, writing to: null
2020-11-06 03:23:48,276 [DEBUG] [hz.silly_cannon.cached.thread-54] [c.h.j.i.e.ExecutionContext]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Starting snapshot 0 phase 1 for job '053b-2414-3200-0001', execution 053b-2414-3201-0001 on member
2020-11-06 03:23:48,278 [DEBUG] [hz.elastic_cannon.generic-operation.thread-18] [c.h.j.i.e.ExecutionContext]: [127.0.0.1]:5702 [jet] [4.4-SNAPSHOT] Starting snapshot 0 phase 1 for job '053b-2414-3200-0001', execution 053b-2414-3201-0001 on member
Will save 321 to snapshot
2020-11-06 03:23:48,749 [DEBUG] [hz.silly_cannon.jet.cooperative.thread-0] [c.h.j.i.u.AsyncSnapshotWriterImpl]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Stats for src: keys=1, chunks=1, bytes=170
2020-11-06 03:23:48,754 [DEBUG] [hz.silly_cannon.jet.cooperative.thread-1] [c.h.j.i.u.AsyncSnapshotWriterImpl]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Stats for sliding-window-prepare: keys=0, chunks=0, bytes=0
2020-11-06 03:23:48,759 [ INFO] [hz.elastic_cannon.jet.cooperative.thread-5] [c.h.j.i.p.S.sliding-window#1]: [127.0.0.1]:5702 [jet] [4.4-SNAPSHOT] Output to ordinal 0: Watermark{ts=00:00:00.300}
2020-11-06 03:23:48,769 [ INFO] [hz.silly_cannon.jet.cooperative.thread-4] [c.h.j.i.p.S.sliding-window#0]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Output to ordinal 0: WindowResult{start=00:00:00.000, end=00:00:00.100, value='100', isEarly=false} (eventTime=00:00:00.099)
2020-11-06 03:23:48,769 [ INFO] [hz.silly_cannon.jet.cooperative.thread-4] [c.h.j.i.p.S.sliding-window#0]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Output to ordinal 0: WindowResult{start=00:00:00.100, end=00:00:00.200, value='100', isEarly=false} (eventTime=00:00:00.199)
2020-11-06 03:23:48,770 [ INFO] [hz.silly_cannon.jet.cooperative.thread-4] [c.h.j.i.p.S.sliding-window#0]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Output to ordinal 0: WindowResult{start=00:00:00.200, end=00:00:00.300, value='100', isEarly=false} (eventTime=00:00:00.299)
2020-11-06 03:23:48,770 [ INFO] [hz.silly_cannon.jet.cooperative.thread-4] [c.h.j.i.p.S.sliding-window#0]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Output to ordinal 0: Watermark{ts=00:00:00.300}
2020-11-06 03:23:51,975 [DEBUG] [hz.silly_cannon.cached.thread-54] [c.h.j.i.JobRepository]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Job cleanup took 1ms
2020-11-06 03:23:57,039 [DEBUG] [hz.silly_cannon.cached.thread-54] [c.h.j.i.JobRepository]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Job cleanup took 36ms
2020-11-06 03:24:00,414 [ INFO] [Time-limited test] [c.h.j.c.JetTestSupport]: Terminating instanceFactory in JetTestSupport.@After
2020-11-06 03:24:00,441 [ INFO] [Thread-478] [c.h.c.LifecycleService]: [127.0.0.1]:5702 [jet] [4.4-SNAPSHOT] [127.0.0.1]:5702 is SHUTTING_DOWN
2020-11-06 03:24:00,443 [ WARN] [Thread-478] [c.h.i.i.Node]: [127.0.0.1]:5702 [jet] [4.4-SNAPSHOT] Terminating forcefully...
2020-11-06 03:24:00,444 [ INFO] [Thread-478] [c.h.i.i.Node]: [127.0.0.1]:5702 [jet] [4.4-SNAPSHOT] Shutting down connection manager...
2020-11-06 03:24:00,476 [ INFO] [Thread-478] [c.h.t.m.MockServer]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Removed connection to endpoint: [127.0.0.1]:5702, connection: MockConnection{localEndpoint=[127.0.0.1]:5701, remoteEndpoint=[127.0.0.1]:5702, alive=false}
2020-11-06 03:24:00,476 [ INFO] [Thread-478] [c.h.t.m.MockServer]: [127.0.0.1]:5702 [jet] [4.4-SNAPSHOT] Removed connection to endpoint: [127.0.0.1]:5701, connection: MockConnection{localEndpoint=[127.0.0.1]:5702, remoteEndpoint=[127.0.0.1]:5701, alive=false}
2020-11-06 03:24:00,477 [ INFO] [Thread-478] [c.h.i.c.i.MembershipManager]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] Removing Member [127.0.0.1]:5702 - 54517725-aadf-4ea1-a655-dd89f54702f8
2020-11-06 03:24:00,480 [ INFO] [Thread-478] [c.h.i.c.ClusterService]: [127.0.0.1]:5701 [jet] [4.4-SNAPSHOT] 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants