Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipeline_testing - #341

Closed
Charlie-George opened this issue Jun 27, 2017 · 11 comments
Closed

pipeline_testing - #341

Charlie-George opened this issue Jun 27, 2017 · 11 comments

Comments

@Charlie-George
Copy link
Member

Charlie-George commented Jun 27, 2017

I'm testing the peakcalling pipeline with the py3 environment, each pipeline seems to run individually now and I've pushed those changes, but I get the following error when it comes to the checksums - has anyone else come across it before? @sebastian-luna-valero @AndreasHeger
``

2017-06-27 19:14:02,518 INFO running statement:

cat test_peakcallingSEbroad.stats | cgat csv2db --retry --database-backend=sqlite --database-name=csvdb --database-host= --database-user= --database-password= --database-port=3306 --add-index=file --table=test_peakcallingSEbroad_results > test_peakcallingSEbroad_results.load

2017-06-27 19:14:11,261 ERROR 1 tasks with errors, please see summary below:

2017-06-27 19:14:11,261 WARNING could not get task information for compareCheckSums, no message sent

2017-06-27 19:14:11,262 ERROR 0: Task=compareCheckSums Error=io.UnsupportedOperation Job=[[test_peakcallingPEnarrow.stats,test_peakcallingPEnarrowIDR.stats,test_peakcallingPEnarrowIDRoracle.stats,test_peakcallingSEIDR.stats,test_peakcallingSEbroad.stats]->md5_compare.tsv]: (can't do nonzero end-relative seeks)

2017-06-27 19:14:11,262 ERROR full traceback is in pipeline.log

Traceback (most recent call last):
File "/ifs/devel/charlotteg/py35-v1/CGATPipelines/CGATPipelines/Pipeline/Control.py", line 943, in main
checksum_level=options.ruffus_checksums_level,
File "/ifs/devel/charlotteg/py35-v1/conda/lib/python3.5/site-packages/ruffus/task.py", line 5938, in pipeline_run
raise job_errors
ruffus.ruffus_exceptions.RethrownJobError:

Original exception:

Exception #1
  'io.UnsupportedOperation(can't do nonzero end-relative seeks)' raised in ...
   Task = def compareCheckSums(...):
   Job  = [[test_peakcallingPEnarrow.stats, test_peakcallingPEnarrowIDR.stats, test_peakcallingPEnarrowIDRoracle.stats, test_peakcallingSEIDR.stats, test_peakcallingSEbroad.stats] -> md5_compare.tsv]

Traceback (most recent call last):
  File "/ifs/devel/charlotteg/py35-v1/conda/lib/python3.5/site-packages/ruffus/task.py", line 751, in run_pooled_job_without_exceptions
    register_cleanup, touch_files_only)
  File "/ifs/devel/charlotteg/py35-v1/conda/lib/python3.5/site-packages/ruffus/task.py", line 567, in job_wrapper_io_files
    ret_val = user_defined_work_func(*params)
  File "/ifs/devel/charlotteg/py35-v1/CGATPipelines/CGATPipelines/pipeline_testing.py", line 467, in compareCheckSums
    is_complete = IOTools.isComplete(logfile)
  File "/ifs/devel/charlotteg/py35-v1/cgat/CGAT/IOTools.py", line 181, in isComplete
    lastline = getLastLine(filename)
  File "/ifs/devel/charlotteg/py35-v1/cgat/CGAT/IOTools.py", line 103, in getLastLine
    f.seek(-1 * offset, 2)
io.UnsupportedOperation: can't do nonzero end-relative seeks

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/ifs/devel/charlotteg/py35-v1/CGATPipelines/CGATPipelines/pipeline_testing.py", line 656, in
sys.exit(P.main(sys.argv))
File "/ifs/devel/charlotteg/py35-v1/CGATPipelines/CGATPipelines/Pipeline/Control.py", line 1028, in main
"pipeline failed with %i errors" % len(value.args))
ValueError: pipeline failed with 1 errors

``

@AndreasHeger
Copy link
Member

This is a py3 issue, I have a fix for this that I need to push.

@AndreasHeger
Copy link
Member

... actually already pushed, could you please git pull --rebase?
Hopefully this will be fixed.

@Charlie-George
Copy link
Member Author

hmm I've done that but it says I'm up to date,
I guess there has been some confusion when we merged with master?
Should I roll back? if so to which commit, I'm a bit confused with the history and at what point fixes have dissappeared. Thanks

@sebastian-luna-valero
Copy link
Member

Hi Charlie,

I agree, I could not see Andreas' fixes into the Py3-migration branches:
https://github.com/CGATOxford/cgat/commits/Py3-migration
https://github.com/CGATOxford/CGATPipelines/commits/Py3-migration

I found the same problem with Jenkins. I think the issue is with pipeline_testing.py trying to access a file (test_name.log) while the pipeline itself is writing to it, and therefore you get an IO error.

However, I might be wrong and Andreas can explain better.

Best regards,
Sebastian

@AndreasHeger
Copy link
Member

AndreasHeger commented Jun 28, 2017 via email

@sebastian-luna-valero
Copy link
Member

Thanks, Andreas.

I think there is an additional issue. The isComplete function will check whether the last line of both test_name.log and test_name/test_name.log starts with # job finished. However, in the case of test_name.log that will never be the case in the compareCheckSums task of pipeline_testing.py since the (meta-)pipeline has not finished yet. Instead, you should be checking the test_name/test_name.log file only, which is the log file for the pipeline being tested.

Best regards,
Sebastian

@AndreasHeger
Copy link
Member

Hi @sebastian-luna-valero , might be a bug, but note that I want to test ./test_name_.log instead of test_name/pipleline.log as the latter will also contain the log of the report building.

There is also the issue to test several logs if there are multiple targets to be tested in a pipeline, see for example pipeline_annotations.
Hopefully I pushed this correctly, I have the following snipped in my repository:

 logfiles = glob.glob(track + "*.log")
        job_finished = True
        for logfile in logfiles:
            is_complete = IOTools.isComplete(logfile)
            E.debug("logcheck: {} = {}".format(logfile, is_complete))
            job_finished = job_finished and is_complete

@sebastian-luna-valero
Copy link
Member

Hi @AndreasHeger

Strange, I don't see new commits the Py3-migration branches yet.

The statement logfiles = glob.glob(track + "*.log"), will return ['test_annotations.log', 'test_annotations.tgz.log'], so you're right and it won't pickup the test_annotations.dir/pipeline.log, which I find necessary to check as well since pipeline_testing.py may finish silently while the pipeline under test may fail, giving exceptions in test_annotations.dir/pipeline.log.

Moreover, I think you can't expect to have # job finished in while running the compareCheckSums task of pipeline_testing.py.

@AndreasHeger
Copy link
Member

AndreasHeger commented Jun 30, 2017 via email

@AndreasHeger
Copy link
Member

AndreasHeger commented Jul 3, 2017 via email

@sebastian-luna-valero
Copy link
Member

Thanks for fixing @AndreasHeger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants