Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with ariba and slurm (signal 18, signal 20) #263

Closed
kimleeng opened this issue Apr 12, 2019 · 2 comments
Closed

Issue with ariba and slurm (signal 18, signal 20) #263

kimleeng opened this issue Apr 12, 2019 · 2 comments
Labels

Comments

@kimleeng
Copy link

Hello,

On our cluster here we run ariba for some pipelines and I was noticing a pattern where early samples would not finish properly. No output folders but for stderr of early samples we get:

Stopping! Signal received: 18
Stopping! Signal received: 20

Now this is very likely due to our grid management setup with slurm where we have partitions set up such that the partition you are submitted to determines the job ordering (ie high priority jobs take resources from low priority jobs). So when we run a batch of jobs on a per sample basis it launches alphabetically which often has higher priority jobs later in the list. So the initial jobs are started then suspended (into memory) then resumed once high priority jobs are done. This leads to the first set of non priority samples to have the error mentioned above (18 being sigcontv likely telling ariba to resume but I'm guessing ariba handles all signals by stopping). Re-running the commands when the queue is not busy fixes the issue (though is unfriendly for automation).

Hope this information makes sense if not let me know and I'll try and describe it better.

Kind regards,
-Kim Ng

@kpepper
Copy link
Member

kpepper commented Apr 12, 2019

Hi @kimleeng. Interesting issue. I can replicate this running:
ariba test out
and sending a SIGTSTP signal to it. I haven't got time to fix it right now, but will look into it as there seems to be a batch of issues around signal handling.

@kpepper kpepper added the bug label Jun 18, 2019
@kpepper
Copy link
Member

kpepper commented Jun 18, 2019

Fixed in Ariba release 2.14.1.

@kpepper kpepper closed this as completed Jun 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants