Skip to content
This repository has been archived by the owner on Jan 31, 2020. It is now read-only.

clinseq update-analysis missing data #13

Closed
obigriffith opened this issue Oct 2, 2013 · 19 comments
Closed

clinseq update-analysis missing data #13

obigriffith opened this issue Oct 2, 2013 · 19 comments
Assignees

Comments

@obigriffith
Copy link
Collaborator

If I try running clinseq update-analysis on TST1:
ogriffit@GMS-Griffith ~> genome model clin-seq update-analysis --individual='H_NJ-HCC1395'

ERROR: Can't call method "name" on an undefined value at /data/opt-gms/X0KZ365/sw/genome/lib/perl/Genome/Model/ClinSeq/Command/UpdateAnalysis.pm line 495.

Seems to be missing something for the differential expression processing profile ($self->differential_expression_pp->name). If I comment out that line, the next error I get is something about $cancer_annotation_db->data_directory, then $misc_annotation_db->data_directory, then $cosmic_annotation_db->data_directory.

ERROR: Can't call method "data_directory" on an undefined value at /data/opt-gms/X0KZ365/sw/genome/lib/perl/Genome/Model/ClinSeq/Command/UpdateAnalysis.pm line 512.

ERROR: Can't call method "data_directory" on an undefined value at /data/opt-gms/X0KZ365/sw/genome/lib/perl/Genome/Model/ClinSeq/Command/UpdateAnalysis.pm line 516.

ERROR: Can't call method "data_directory" on an undefined value at /data/opt-gms/X0KZ365/sw/genome/lib/perl/Genome/Model/ClinSeq/Command/UpdateAnalysis.pm line 520.

If I comment them all out update-analysis proceeds and does something sensible. Currently it shows me that refaligns are running and suggests a command for starting rnaseq.

@malachig
Copy link
Collaborator

This issue seems similar to issue #15 where some metadata is needed to run the TST1 analyses once the GMS fires up but they are perhaps not being grabbed when the metadata object is dumped?

Perhaps requires modification to this code?:

"/lib/perl/Genome/Model/Command/Export/Metadata.pm"

@sakoht
Copy link
Contributor

sakoht commented Oct 19, 2013

As discussed orally:

These Genome::Db entities exist (can be found by get() or a lister), when the files are in the correct place on the filesystem. To put these in place, they need to be checked-out from github into the $GENOME_SW directory.

There is a tool that will do that for cosmic, i.e.:
genome db install cosmic v65.1

When done, this will work:
$db = Genome::Db->get("cosmic/v65.1")

As will this:
genome db list id='cosmic/v65.1'

Or:
genome db list "source_name='cosmic' and external_version = 65 and import_iteration = 1"

So that install command needs to:

  1. be present for each of the 3 data sources
  2. be run after import

The raw/ugly thing to get it to run might be:

  • go through each Genome::Model::Input we are importing
  • see if the value_class_name "isa" Genome::Db (it will eq something like Genome::Db::Blah)
  • if so try, $value_class_name->get($value_id)
  • if that fails, execute the command $value_class_name . "::Command::Import" with a parameter of $value_id to import it

Something nicer might be:

  • update "genome model export metadata" to dump Genome::Db data (it currently is hard-coded to not, since those don't get "created")
  • update "genome model import metadata" to react to those specially, rather than create the object normally
  • generate a fresh dump of the test model with this new metadata

I will do this unless someone else claims it.

@malachig
Copy link
Collaborator

Now when "genome model import metadata" runs, we receive the following warnings:

WARNING: EXTERNAL DATABASE NOT INSTALLED. DO: genome db cosmic install cosmic/61.1
WARNING: EXTERNAL DATABASE NOT INSTALLED. DO: genome db tgi install tgi/cancer-annotation/human/build37-20130401.1
WARNING: EXTERNAL DATABASE NOT INSTALLED. DO: genome db tgi install tgi/misc-annotation/human/build37-20130113.1

Next step is to test these commands and see if we can successfully install these databases inside the standalone GMS

@malachig
Copy link
Collaborator

These genome db install commands do seem to work but they give the following warnings:

warning: Remote branch cosmic/61.1 not found in upstream origin, using HEAD instead

The next step is to try running genome model clin-seq update-analysis after these have been installed.

@malachig
Copy link
Collaborator

Something doesn't seem right with this. For one thing, after running the commands above, none of the file based databases are registered in Postgres. 'genome db list' does not show anything.

Each installer above inherits from: Genome::Db::Command::InstallFromGitRepo.

This method lives here:
Genome/Db/Command/InstallFromGitRepo.pm

This code performs the task of downloading git repos for the file based databases to the directory specified in $ENV{GENOME_DB}. If I go there (e.g. /opt/gms/4K8W670/db/) I see empty directories that contain only a README.md. The expected files are not present. There is a .git dir with various stuff in it, and it did take a while to download each one, but something is not quite right here.

@gatoravi gatoravi reopened this Nov 25, 2013
@gatoravi
Copy link
Contributor

sorry, I hit close by mistake.

@gatoravi
Copy link
Contributor

The cosmic db that we are trying to clone from here https://github.com/genome-vendor/genome-db-cosmic-data, does not seem to have the branch that we are looking for cosmic/61.1. All the branches seem to have just the README except for http://github.com/genome-vendor/genome-db-cosmic-data/tree/65_v2.

Should we clone this branch instead ?

@gatoravi
Copy link
Contributor

newer versions of git give a fatal error instead of a warning for the same 'clone' command.

@malachig
Copy link
Collaborator

I suspect several problems may simply be due to the half baked state of:

Genome/Db/Command/InstallFromGitRepo.pm

First, apparently the genome db object requires a 'latest' symlink to be placed in a particular location in each git repo after installation. This is not being done correctly.

Second, the git clone command that is being run does not seem valid. The branch is not being indicated correctly and it therefore clones the default branch.

To test fixes for these, I will destroy the git repos in a test installation, fix the code, add more verbose warnings so that we can understand better what is happening, and re-run the db installation commands.

Then make sure they can be correctly listed with 'genome db list'.

Then see if various issues related to the missing file based databases are resolved.

@malachig
Copy link
Collaborator

We still have an issue that clin-seq update-analysis expects to be able to query cosmic, misc, and cancer annotations. It gets default values of these from ClinSeq.pm and then fails to find those that may not have been installed in the standalone. I have updated the code to allow us to override the defaults with specific versions actually needed and installed for the demonstration analysis. Relevant commits to master and gms-pub here:

https://github.com/genome/gms-core/commit/268d26c46e66f822ec6795177d06328b91414da0
https://github.com/genome/gms-core/commit/f3d32199fa8aaa5281298729d47133298ba665fc

@malachig
Copy link
Collaborator

I have changed the default version of cosmic to 61.1 in the gms-pub branch or gms-core for convenience when running clin-seq update-analysis in the test environment and with the TST1 (HCC1395) demonstration analysis. See this commit for details:

https://github.com/genome/gms-core/commit/bd198ab0950b20d5d583dcdf020b56f1d6bcd2a8

@malachig
Copy link
Collaborator

The following commands are now working as expected without warnings or errors in the sGMS:

genome model clin-seq update-analysis --display-defaults
genome model clin-seq update-analysis --individual=H_NJ-HCC1395

However there are still some errors with this command:

genome model clin-seq update-analysis --individual=H_NJ-HCC1395 --samples='id in [2889981254,2889981253,2889953341,2889953342]'

In particular this error:
ERROR: Can't locate object method "cancer_annotation_db" via package "Genome::Model::RnaSeq" (perhaps you forgot to load "Genome::Model::RnaSeq"?) at /opt/gms/PEL8970/sw/genome/lib/perl/Genome/Model/ClinSeq/Command/UpdateAnalysis.pm line 1224

Our rna-seq models do not have the cancer annotation DB as input. clin-seq expects this to be the case. For backwards compatibility, that is probably unwise. For convenience maybe we should add these inputs to the HCC1395 demonstration RNA-seq models and export a new metadata dump...

@malachig
Copy link
Collaborator

This error is related to the recent addition of cancer-annotation db as an optional input to rna-seq models. We have not yet merged this change into the gms-pub branch. It is only used for chimera-scan runs. Something we are currently not running in the sGMS anyway. For now I have reverted update-analysis to not require cancer-annotations. Once we do a merge there will be no harm in have cancer-annotation as an input even if it is not used. Related commit is here:
https://github.com/genome/gms-core/commit/997c4439f86a3eea3b8e0d188dc3a31da5a25a69

@malachig
Copy link
Collaborator

The following command is now working:

genome model clin-seq update-analysis --individual=H_NJ-HCC1395 --samples='id in [2889981254,2889981253,2889953341,2889953342]'

One remaining issue that I see. When the user runs this command it fails to recognize valid ref-align models with 0 builds and asks the user to create a new ref-align model. The user should simply be warned that the model needs a build in these cases. Once there is at least one build with any status, the behavior of the tool starts to make more sense...

@obigriffith
Copy link
Collaborator Author

It looks like when we merged clin-seq recently we re-introduced this problem? The RNAseq models for HCC1395 on stand-alone do not have cancer_annotation_db as input but clin-seq update-analysis seems to be expecting it again.

@malachig
Copy link
Collaborator

I fixed this. Commit in gms-core gms-pub branch is here: https://github.com/genome/gms-core/commit/a97635988c47e13fe1e1fd563dccffbb2e8e69d7

This was also pushed to gms-core master branch here: https://github.com/genome/gms-core/commit/b070eea3ab348a8f9f57c20259b2fdb15f719aaa

@malachig
Copy link
Collaborator

You can run clin-seq update analysis now. But it wants to create rna-seq models using with 'cancer_annotation_db' as an input. This is a new feature of the rna-seq class that has not been merged into gms-pub yet. So we can't create models that have this input and update-analysis will not recognize the models missing this input as meeting all desired criteria. We could create a work-around but the next merge should solve this problem so I think we should move towards actually doing that merge.

@malachig
Copy link
Collaborator

I believe this is all resolved now with the latest commit here:
https://github.com/genome/gms-core/commit/b8588b369082ffb504f7d97f5dd2cf1345f2975d

@malachig
Copy link
Collaborator

This issue is resolved. Everything seems to be in order with genome model clin-seq update-analysis. Closing this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants