-
Notifications
You must be signed in to change notification settings - Fork 22
clinseq update-analysis missing data #13
Comments
This issue seems similar to issue #15 where some metadata is needed to run the TST1 analyses once the GMS fires up but they are perhaps not being grabbed when the metadata object is dumped? Perhaps requires modification to this code?: "/lib/perl/Genome/Model/Command/Export/Metadata.pm" |
As discussed orally: These Genome::Db entities exist (can be found by get() or a lister), when the files are in the correct place on the filesystem. To put these in place, they need to be checked-out from github into the $GENOME_SW directory. There is a tool that will do that for cosmic, i.e.: When done, this will work: As will this: Or: So that install command needs to:
The raw/ugly thing to get it to run might be:
Something nicer might be:
I will do this unless someone else claims it. |
Now when "genome model import metadata" runs, we receive the following warnings: WARNING: EXTERNAL DATABASE NOT INSTALLED. DO: genome db cosmic install cosmic/61.1 Next step is to test these commands and see if we can successfully install these databases inside the standalone GMS |
These genome db install commands do seem to work but they give the following warnings: warning: Remote branch cosmic/61.1 not found in upstream origin, using HEAD instead The next step is to try running genome model clin-seq update-analysis after these have been installed. |
Something doesn't seem right with this. For one thing, after running the commands above, none of the file based databases are registered in Postgres. 'genome db list' does not show anything. Each installer above inherits from: Genome::Db::Command::InstallFromGitRepo. This method lives here: This code performs the task of downloading git repos for the file based databases to the directory specified in $ENV{GENOME_DB}. If I go there (e.g. /opt/gms/4K8W670/db/) I see empty directories that contain only a README.md. The expected files are not present. There is a .git dir with various stuff in it, and it did take a while to download each one, but something is not quite right here. |
sorry, I hit close by mistake. |
The cosmic db that we are trying to clone from here https://github.com/genome-vendor/genome-db-cosmic-data, does not seem to have the branch that we are looking for cosmic/61.1. All the branches seem to have just the README except for http://github.com/genome-vendor/genome-db-cosmic-data/tree/65_v2. Should we clone this branch instead ? |
newer versions of git give a fatal error instead of a warning for the same 'clone' command. |
I suspect several problems may simply be due to the half baked state of: Genome/Db/Command/InstallFromGitRepo.pm First, apparently the genome db object requires a 'latest' symlink to be placed in a particular location in each git repo after installation. This is not being done correctly. Second, the git clone command that is being run does not seem valid. The branch is not being indicated correctly and it therefore clones the default branch. To test fixes for these, I will destroy the git repos in a test installation, fix the code, add more verbose warnings so that we can understand better what is happening, and re-run the db installation commands. Then make sure they can be correctly listed with 'genome db list'. Then see if various issues related to the missing file based databases are resolved. |
We still have an issue that clin-seq update-analysis expects to be able to query cosmic, misc, and cancer annotations. It gets default values of these from ClinSeq.pm and then fails to find those that may not have been installed in the standalone. I have updated the code to allow us to override the defaults with specific versions actually needed and installed for the demonstration analysis. Relevant commits to master and gms-pub here: https://github.com/genome/gms-core/commit/268d26c46e66f822ec6795177d06328b91414da0 |
I have changed the default version of cosmic to 61.1 in the gms-pub branch or gms-core for convenience when running clin-seq update-analysis in the test environment and with the TST1 (HCC1395) demonstration analysis. See this commit for details: https://github.com/genome/gms-core/commit/bd198ab0950b20d5d583dcdf020b56f1d6bcd2a8 |
The following commands are now working as expected without warnings or errors in the sGMS:
However there are still some errors with this command:
In particular this error: Our rna-seq models do not have the cancer annotation DB as input. clin-seq expects this to be the case. For backwards compatibility, that is probably unwise. For convenience maybe we should add these inputs to the HCC1395 demonstration RNA-seq models and export a new metadata dump... |
This error is related to the recent addition of cancer-annotation db as an optional input to rna-seq models. We have not yet merged this change into the gms-pub branch. It is only used for chimera-scan runs. Something we are currently not running in the sGMS anyway. For now I have reverted update-analysis to not require cancer-annotations. Once we do a merge there will be no harm in have cancer-annotation as an input even if it is not used. Related commit is here: |
The following command is now working:
One remaining issue that I see. When the user runs this command it fails to recognize valid ref-align models with 0 builds and asks the user to create a new ref-align model. The user should simply be warned that the model needs a build in these cases. Once there is at least one build with any status, the behavior of the tool starts to make more sense... |
It looks like when we merged clin-seq recently we re-introduced this problem? The RNAseq models for HCC1395 on stand-alone do not have cancer_annotation_db as input but clin-seq update-analysis seems to be expecting it again. |
I fixed this. Commit in gms-core gms-pub branch is here: https://github.com/genome/gms-core/commit/a97635988c47e13fe1e1fd563dccffbb2e8e69d7 This was also pushed to gms-core master branch here: https://github.com/genome/gms-core/commit/b070eea3ab348a8f9f57c20259b2fdb15f719aaa |
You can run clin-seq update analysis now. But it wants to create rna-seq models using with 'cancer_annotation_db' as an input. This is a new feature of the rna-seq class that has not been merged into gms-pub yet. So we can't create models that have this input and update-analysis will not recognize the models missing this input as meeting all desired criteria. We could create a work-around but the next merge should solve this problem so I think we should move towards actually doing that merge. |
I believe this is all resolved now with the latest commit here: |
This issue is resolved. Everything seems to be in order with |
If I try running clinseq update-analysis on TST1:
ogriffit@GMS-Griffith ~> genome model clin-seq update-analysis --individual='H_NJ-HCC1395'
ERROR: Can't call method "name" on an undefined value at /data/opt-gms/X0KZ365/sw/genome/lib/perl/Genome/Model/ClinSeq/Command/UpdateAnalysis.pm line 495.
Seems to be missing something for the differential expression processing profile ($self->differential_expression_pp->name). If I comment out that line, the next error I get is something about $cancer_annotation_db->data_directory, then $misc_annotation_db->data_directory, then $cosmic_annotation_db->data_directory.
ERROR: Can't call method "data_directory" on an undefined value at /data/opt-gms/X0KZ365/sw/genome/lib/perl/Genome/Model/ClinSeq/Command/UpdateAnalysis.pm line 512.
ERROR: Can't call method "data_directory" on an undefined value at /data/opt-gms/X0KZ365/sw/genome/lib/perl/Genome/Model/ClinSeq/Command/UpdateAnalysis.pm line 516.
ERROR: Can't call method "data_directory" on an undefined value at /data/opt-gms/X0KZ365/sw/genome/lib/perl/Genome/Model/ClinSeq/Command/UpdateAnalysis.pm line 520.
If I comment them all out update-analysis proceeds and does something sensible. Currently it shows me that refaligns are running and suggests a command for starting rnaseq.
The text was updated successfully, but these errors were encountered: