Google Batch should comply with --Error is ignored flag #3166

zyosufzai · 2022-08-31T15:02:31Z

Bug report

Google Batch stops running even though nextflow has supplied a --Error is ignored flag

Expected behavior and actual behavior

Google Batch should continue running even if one of the jobs fails if it has been told to ignore the error. But what actually happens is the workflow terminates with a error message (see below)

Steps to reproduce the problem

Input the following on the command line:

NXF_VER="22.08.1-edge" ./nextflow run nf-core/methylseq -r 1.6.1 -c nextflow.config -profile test,gbatch

with the following config file:

gbatch{ 
      process.executor = 'google-batch' 
      process.machineType = 'n2-standard-16' 
      process.time = '2h' 
      workDir = 'gs://nextflowdemobucket/zy-test/testrna_gbatch_tmp' 
      google.location = 'us-central1' 
      google.region  = 'us-central1' 
      google.project = ''
      params.outdir = 'gs://nextflowdemobucket/zy-test/testrna_gbatch'
      google.batch.bootDiskSize = 100.GB
      }

Program output

[2e/9436f4] NOTE: Process preseq (SRR389222_sub1) terminated with an error exit status (139) -- Error is ignored
Error executing process > 'preseq (SRR389222_sub2)'

Caused by:
  Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@7426a203[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@6126b8c[Wrapped task = TrustedListenableFutureTask@d542ca6[status=PENDING, info=[task=[running=[NOT STARTED YET], com.google.api.gax.rpc.AttemptCallable@464ab2c1]]]]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@61fde900[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]

Environment

Nextflow version: 22.08.1-edge
Java version: 17.0.3-internal
Operating system: Linux
Bash version: 5.0.3(1)-release

Additional context

(Add any other context about the problem here)

The text was updated successfully, but these errors were encountered:

bentsherman · 2022-08-31T17:11:18Z

Can you provide the full log? There are two different tasks, SRR389222_sub1 and SRR389222_sub2, and the first one is ignored but the second one is triggering workflow termination. Need to see what happened with the second one.

zyosufzai · 2022-08-31T19:53:39Z

Since the log are broken up by tags I have the log file for the tag [2e/9436f4] and also a trace back (attatched)
zy-test_testm_gbatch_tmp_2e_9436f428d4c0b830b66ed8bc37c994_.command.log
batch-trace.txt

zyosufzai · 2022-08-31T20:04:10Z

Sorry the first one log file I added was for SRR389222_sub1. This is the second log file for SRR389222_sub2.
zy-test_testm_gbatch_tmp_bf_62b9e1e10833042281259ba9560df5_.command.log

pditommaso · 2022-08-31T20:06:38Z

It would be great if you could upload the .nextflow.log file of the failed execution

zyosufzai · 2022-08-31T20:38:48Z

gotcha I couldn't find the location of the log file of that session but I reran the pipeline and directed the file to a known location. It has the same errors with the same tasks.
nextflow.log

pditommaso · 2022-09-01T14:48:13Z

That's a weird error. It looks reported by a thread pool used by the Google SDK

Aug-31 20:35:44.041 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'preseq (SRR389222_sub2)'

Caused by:
  Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@49b3a025[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@2e0ceb8c[Wrapped task = TrustedListenableFutureTask@25c1396d[status=PENDING, info=[task=[running=[NOT STARTED YET], com.google.api.gax.rpc.AttemptCallable@2db57b9a]]]]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@aa6214[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]

java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@49b3a025[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@2e0ceb8c[Wrapped task = TrustedListenableFutureTask@25c1396d[status=PENDING, info=[task=[running=[NOT STARTED YET], com.google.api.gax.rpc.AttemptCallable@2db57b9a]]]]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@aa6214[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
	at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2065)
	at java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:833)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562)
	at com.google.common.util.concurrent.MoreExecutors$ScheduledListeningDecorator.schedule(MoreExecutors.java:663)
	at com.google.api.gax.retrying.ScheduledRetryingExecutor.submit(ScheduledRetryingExecutor.java:116)
	at com.google.api.gax.retrying.CallbackChainRetryingFuture$AttemptCompletionListener.handle(CallbackChainRetryingFuture.java:137)
	at com.google.api.gax.retrying.CallbackChainRetryingFuture$AttemptCompletionListener.run(CallbackChainRetryingFuture.java:117)
	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1270)
	at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1038)
	at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:808)
	at com.google.api.core.AbstractApiFuture$InternalSettableFuture.setException(AbstractApiFuture.java:94)
	at com.google.api.core.AbstractApiFuture.setException(AbstractApiFuture.java:76)
	at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
	at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:67)
	at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1132)
	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1270)
	at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1038)
	at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:808)
	at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:572)
	at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:542)
	at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
	at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
	at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
	at com.google.api.gax.grpc.ChannelPool$ReleasingClientCall$1.onClose(ChannelPool.java:535)
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562)
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

zyosufzai · 2022-09-01T21:45:10Z

So could it be that because preseq of SRR389222_sub1 and 2 failed and caused termination because the pool thread was not able to reuse the previously created threads to execute new requests? I'm wondering if the 'Ignore Errors' flag doesnt comply with google batch because it needs its own exception rule written in its JSON file? Looking at the documentations in the link below I wonder if there needs to be a "ignore_exit_status" : https://cloud.google.com/batch/docs/reference/rpc/google.cloud.batch.v1

1. Adds the projectId to BatchLogging's LoggingOptions so the logs will be fetched from the correct project. Previously I got a lot of "that resource might not exist" responses from the log read requests. I'm not sure if there's something special about my environment that causes me to need the projectId in the options, but I think it couldn't hurt to have the projectId in general. 2. Removes the old paseOutput function, which relied on the STDERR and STDOUT prefixes in the payload of the log entries. Batch no longer adds those prefixes so that parsing doesn't work. Now we can distinguish stderr from stdout by looking at the logEntry's severity. 3. Catches any exception thrown by the log reading code. It could be a permissions issuee like I was seeing, or a thread pool issuee as in nextflow-io#3166. Either way, something going wrong while trying to read task logs should probably not stop the whole workflow. With these changes, the nf-core/methylseq workflow runs can be run using the google-batch executor.

1. Adds the projectId to BatchLogging's LoggingOptions so the logs will be fetched from the correct project. Previously I got a lot of "that resource might not exist" responses from the log read requests. I'm not sure if there's something special about my environment that causes me to need the projectId in the options, but I think it couldn't hurt to have the projectId in general. 2. Removes the old paseOutput function, which relied on the STDERR and STDOUT prefixes in the payload of the log entries. Batch no longer adds those prefixes so that parsing doesn't work. Now we can distinguish stderr from stdout by looking at the logEntry's severity. 3. Catches any exception thrown by the log reading code. It could be a permissions issuee like I was seeing, or a thread pool issuee as in nextflow-io#3166. Either way, something going wrong while trying to read task logs should probably not stop the whole workflow. With these changes, the nf-core/methylseq workflow can be run with the google-batch executor.

1. Adds the projectId to BatchLogging's LoggingOptions so the logs will be fetched from the correct project. Previously I got a lot of "that resource might not exist" responses from the log read requests. I'm not sure if there's something special about my environment that causes me to need the projectId in the options, but I think it couldn't hurt to have the projectId in general. 2. Removes the old parseOutput function, which relied on the STDERR and STDOUT prefixes in the payload of the log entries. Batch no longer adds prefixes so that parsing doesn't work. Now we can distinguish stderr from stdout by looking at the logEntry's severity. 3. Catches any exception thrown by the log reading code. It could be a permissions issue like I was seeing, or a thread pool issuee as in nextflow-io#3166. Either way, something going wrong while trying to read task logs should probably not stop the whole workflow. With these changes, the nf-core/methylseq workflow can be run with the google-batch executor. Signed-off-by: Aaron Golden <[email protected]>

1. Adds the projectId to BatchLogging's LoggingOptions so the logs will be fetched from the correct project. Previously I got a lot of "that resource might not exist" responses from the log read requests. I'm not sure if there's something special about my environment that causes me to need the projectId in the options, but I think it couldn't hurt to have the projectId in general. 2. Removes the old parseOutput function, which relied on the STDERR and STDOUT prefixes in the payload of the log entries. Batch no longer adds prefixes so that parsing doesn't work. Now we can distinguish stderr from stdout by looking at the logEntry's severity. 3. Catches any exception thrown by the log reading code. It could be a permissions issue like I was seeing, or a thread pool issuee as in #3166. Either way, something going wrong while trying to read task logs should probably not stop the whole workflow. With these changes, the nf-core/methylseq workflow can be run with the google-batch executor. Signed-off-by: Aaron Golden <[email protected]> Signed-off-by: Aaron Golden <[email protected]> Co-authored-by: Paolo Di Tommaso <[email protected]>

1. Adds the projectId to BatchLogging's LoggingOptions so the logs will be fetched from the correct project. Previously I got a lot of "that resource might not exist" responses from the log read requests. I'm not sure if there's something special about my environment that causes me to need the projectId in the options, but I think it couldn't hurt to have the projectId in general. 2. Removes the old parseOutput function, which relied on the STDERR and STDOUT prefixes in the payload of the log entries. Batch no longer adds prefixes so that parsing doesn't work. Now we can distinguish stderr from stdout by looking at the logEntry's severity. 3. Catches any exception thrown by the log reading code. It could be a permissions issue like I was seeing, or a thread pool issuee as in nextflow-io#3166. Either way, something going wrong while trying to read task logs should probably not stop the whole workflow. With these changes, the nf-core/methylseq workflow can be run with the google-batch executor. Signed-off-by: Aaron Golden <[email protected]> Signed-off-by: Aaron Golden <[email protected]> Co-authored-by: Paolo Di Tommaso <[email protected]> Signed-off-by: Marco De La Pierre <[email protected]>

1. Adds the projectId to BatchLogging's LoggingOptions so the logs will be fetched from the correct project. Previously I got a lot of "that resource might not exist" responses from the log read requests. I'm not sure if there's something special about my environment that causes me to need the projectId in the options, but I think it couldn't hurt to have the projectId in general. 2. Removes the old parseOutput function, which relied on the STDERR and STDOUT prefixes in the payload of the log entries. Batch no longer adds prefixes so that parsing doesn't work. Now we can distinguish stderr from stdout by looking at the logEntry's severity. 3. Catches any exception thrown by the log reading code. It could be a permissions issue like I was seeing, or a thread pool issuee as in nextflow-io#3166. Either way, something going wrong while trying to read task logs should probably not stop the whole workflow. With these changes, the nf-core/methylseq workflow can be run with the google-batch executor. Signed-off-by: Aaron Golden <[email protected]> Signed-off-by: Aaron Golden <[email protected]> Co-authored-by: Paolo Di Tommaso <[email protected]>

stale · 2023-03-18T10:20:34Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

pditommaso · 2023-03-19T09:29:16Z

Closing this in favour of #3772

zyosufzai changed the title ~~Google Batch doesn't comply with --Error is ignored flag~~ Google Batch should comply with --Error is ignored flag Aug 31, 2022

bentsherman added the executor/google-batch label Aug 31, 2022

hnawar mentioned this issue Nov 17, 2022

Ignore Exit status for Runnables in Google Batch #3401

Closed

aaronegolden mentioned this issue Nov 29, 2022

Fix a few issues in BatchLogging.groovy #3443

Merged

bentsherman mentioned this issue Feb 6, 2023

[Google Batch] errorStrategy 'retry' fails when multiple tasks attempt to retry #3607

Closed

stale bot added the stale label Mar 18, 2023

pditommaso added the pinned label Mar 18, 2023

pditommaso added this to the 23.04.0 milestone Mar 18, 2023

stale bot removed the stale label Mar 18, 2023

pditommaso closed this as completed Mar 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Batch should comply with --Error is ignored flag #3166

Google Batch should comply with --Error is ignored flag #3166

zyosufzai commented Aug 31, 2022

bentsherman commented Aug 31, 2022

zyosufzai commented Aug 31, 2022 •

edited

Loading

zyosufzai commented Aug 31, 2022

pditommaso commented Aug 31, 2022

zyosufzai commented Aug 31, 2022

pditommaso commented Sep 1, 2022

zyosufzai commented Sep 1, 2022 •

edited

Loading

stale bot commented Mar 18, 2023

pditommaso commented Mar 19, 2023

Google Batch should comply with --Error is ignored flag #3166

Google Batch should comply with --Error is ignored flag #3166

Comments

zyosufzai commented Aug 31, 2022

Bug report

Expected behavior and actual behavior

Steps to reproduce the problem

Program output

Environment

Additional context

bentsherman commented Aug 31, 2022

zyosufzai commented Aug 31, 2022 • edited Loading

zyosufzai commented Aug 31, 2022

pditommaso commented Aug 31, 2022

zyosufzai commented Aug 31, 2022

pditommaso commented Sep 1, 2022

zyosufzai commented Sep 1, 2022 • edited Loading

stale bot commented Mar 18, 2023

pditommaso commented Mar 19, 2023

zyosufzai commented Aug 31, 2022 •

edited

Loading

zyosufzai commented Sep 1, 2022 •

edited

Loading