-
Notifications
You must be signed in to change notification settings - Fork 1
Job shows Complete even though it failed #19
Comments
Related to #12? |
In fact, there are other jobs that failed and did not produce rows in the Per_Image table, yet show as "Complete". These have Memory Errors:
|
The current master branch (and the way it was on 9/10/15 at your checkout) exits with a status code of 0 even if there's an exception. Code that we're planning to check in has this facility in it. I don't use the "done" file and maybe that's a mistake. I'd like to put working on BatchProfiler on the back burner for a couple of weeks, though, maybe afterwards, revisit. |
But how can I resubmit the jobs that have failed? It seems impossible from any ViewBatch page since all batches (mis)report Complete. Can I submit via sudo as imageweb in any way? I looked at the job_scripts but I can't see how to do this. |
Sorry David, for this case, how about if I mark the ones that failed as On Tue, Nov 24, 2015 at 9:25 AM, David Logan [email protected]
|
Sure, please mark them as failed (how can I do that myself?). Just now I raised the memory_limit in the batchprofiler_2/batch database because yes, there are lots of synapse objects per image -- will that still work as in the old db scheme for resubmitting? |
Raising the memory limit should work. To reset the status, you can do select task_status_id from run_job_status where batch_array_id = 213 and Then copy the task status IDs and do delete from task_status where task_status_id in (203818, 203825) I just did this for 203818 to see if it worked and it did. You can do it On Tue, Nov 24, 2015 at 9:39 AM, David Logan [email protected]
|
There were only the two... you can delete 203825 if you want to resubmit. |
Cool - I think I get it now. I will delete 203825 and resubmit, thanks! |
Wait - it looks to me like it is run.213.21.txt that is not done. (Also 213.19 for a different reason) Does that makes sense to you, rather than 213.23? |
I made run.213.21.txt's status change to test but left run.213.23.txt as "Done" so you could try out the delete. I don't know what you're running for MySQL, but it may be that you have to commit the transaction? (try tying "commit"). |
I was just being cautious before and trying to understand the procedure. I just ran
successfully and without even hitting (re)submit, it now reports Running. Is that right that it submits after the delete? |
http://imagewebrhel6/batchprofiler/cgi-bin/ViewBatch.py?batch_id=117, run.213.19.txt initially failed because a file didn't exist. (I had tried to fix a badly named file, but the filelist wasn't updated.) In any case, the error had said a file did not exist, however the Status says "Complete". I only knew to look because it finished in 24 sec, i.e. too soon.
Subsequently, I tried to get ViewBatch to show Resubmit by clicking the Delete All button. The txt and err files deleted, but there is no Resubmit button, i.e. the Status still says Complete, and I don't know how to resubmit this individual job run.213.19
The text was updated successfully, but these errors were encountered: