Issue with ariba and slurm (signal 18, signal 20) #263

kimleeng · 2019-04-12T12:24:14Z

Hello,

On our cluster here we run ariba for some pipelines and I was noticing a pattern where early samples would not finish properly. No output folders but for stderr of early samples we get:

Stopping! Signal received: 18
Stopping! Signal received: 20

Now this is very likely due to our grid management setup with slurm where we have partitions set up such that the partition you are submitted to determines the job ordering (ie high priority jobs take resources from low priority jobs). So when we run a batch of jobs on a per sample basis it launches alphabetically which often has higher priority jobs later in the list. So the initial jobs are started then suspended (into memory) then resumed once high priority jobs are done. This leads to the first set of non priority samples to have the error mentioned above (18 being sigcontv likely telling ariba to resume but I'm guessing ariba handles all signals by stopping). Re-running the commands when the queue is not busy fixes the issue (though is unfriendly for automation).

Hope this information makes sense if not let me know and I'll try and describe it better.

Kind regards,
-Kim Ng

kpepper · 2019-04-12T15:44:19Z

Hi @kimleeng. Interesting issue. I can replicate this running:
ariba test out
and sending a SIGTSTP signal to it. I haven't got time to fix it right now, but will look into it as there seems to be a batch of issues around signal handling.

kpepper · 2019-06-18T14:34:25Z

Fixed in Ariba release 2.14.1.

kpepper added the bug label Jun 18, 2019

kpepper closed this as completed Jun 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with ariba and slurm (signal 18, signal 20) #263

Issue with ariba and slurm (signal 18, signal 20) #263

kimleeng commented Apr 12, 2019

kpepper commented Apr 12, 2019 •

edited

Loading

kpepper commented Jun 18, 2019

Issue with ariba and slurm (signal 18, signal 20) #263

Issue with ariba and slurm (signal 18, signal 20) #263

Comments

kimleeng commented Apr 12, 2019

kpepper commented Apr 12, 2019 • edited Loading

kpepper commented Jun 18, 2019

kpepper commented Apr 12, 2019 •

edited

Loading