Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the conda environment and build. #4749

Merged
merged 1 commit into from
Jun 1, 2018
Merged

Conversation

cmnbroad
Copy link
Collaborator

@cmnbroad cmnbroad commented May 8, 2018

Several changes:

  • Fix Conda env create fails with FileNotFoundError #4741, where newer versions of conda appear to treat relative references in the environment yml as being relative to the yml file instead of relative to the cwd (based on observation).
  • Add a second conda yml file (gatkcondaenv.intel.yml) for environments that use Intel hardware acceleration and Intel Tensorflow package (based on Improvements to the prep of the input tensors to CNNScoreVariants (2D) #4735).
  • Add a gradle task (condaEnvironmentDefinition) to generate the conda yml files from a single template to ensure that all the environment definitions remain in sync. This task also generates the Python package archive.
  • Add a gradle task (localDevCondaEnv) to create or update a local (non-Intel) conda environment. This is a shortcut for use during development when you're iteratively changing/testing Python code and want to update the conda env.
  • Opportunistically removed the prefix verb "create" from the name of the createPythonPackageArchive task, which is now called pythonPackageArchive.

README.md Outdated
additional information about using and managing Conda environments.
pre-configured. In order to establish an environment suitable to run these tools outside of the Docker image,
do the following:
* Make sure [Conda or MiniConda](https://conda.io/docs/index.html) is installed. Miniconda is sufficient.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split out a separate section on conda in the README, and link to it from this Requirements bullet, instead of having it all inline.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to mention the Intel environment. @erniebrau we want to start shipping a conda environment yml for that right ?

@droazen
Copy link
Contributor

droazen commented May 8, 2018

@mwalker174 Can you test this branch out?

@cmnbroad
Copy link
Collaborator Author

cmnbroad commented May 8, 2018

@erniebrau @samuelklee @mbabadi @lucidtronix @TedBrookings @mwalker174 With theses changes, the conda yml definition is now generated rather than directly checked in, so after pulling you'll have to re-generate them using the above tasks. Hopefully this workflow is useable, feel free to comment.

@droazen droazen requested a review from jamesemery May 9, 2018 19:24
Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the readme

README.md Outdated
* Alternatively, the ```./gradlew condaEnvironmentDefinition``` task can be used to only generate the
archive and yml definition files used to create the conda environment. The conda environment can then
be manually created from within the build directory by executing the command:
```conda env create -n gatk -f gatkcondaenv.yml``` and `activated as described above.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra tick mark

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

README.md Outdated
* Run ```./gradlew localDevCondaEnv```. This will autogenerate the Python package archive and conda
yml dependency file(s) in the build directory, and create the local conda environment named "gatk".
* Activate the GATK conda environment by running the command as described above.
* Alternatively, the ```./gradlew condaEnvironmentDefinition``` task can be used to only generate the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to only generate the -> to generate just the

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, done.

README.md Outdated
pre-configured. In order to establish an environment suitable to run these tools outside of the Docker image,
do the following:
* Make sure [Conda or MiniConda](https://conda.io/docs/index.html) is installed. Miniconda is sufficient.
* If running from a zip or tar distribution:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having three sets of instructions, perhaps separate these out into a section titled something like "first generate the conda environment files" with the three subsections, then another bullet with source activate gatk, as its a little confusing to have to refer above when following the instructions, where above do you mean? etc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its getting to be super confusing, especially since I had to add something that mentions the intel yml environment. Done.

README.md Outdated
* If running from a zip or tar distribution:
* Run the command ```conda env create -f gatkcondaenv.yml``` to create the conda environment
required by GATK.
* The conda environment must be activated from within the shell from which GATK is run whenever tools
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to its own bullet point and clarify this sentence as its a little confusing to read.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mwalker174
Copy link
Contributor

@droazen @cmnbroad Thank you these new gradle tasks are very useful! The yml issue is resolved now using conda 4.5.3.

@@ -44,8 +44,8 @@ dependencies:
- scipy==1.0.0
- six==1.11.0
- tensorflow==1.4.0
- tensorflow-tensorboard==0.4.0rc3
- $tensorFlowDependency
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm - this needs to be fixed - this line should replace the tensorflow dependency, not the the tensorboard dependency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmnbroad a few comments, nothing very serious, seems fine. 👍 when you've addressed the various comments, particularly your own comment about the $tensorFlowDependency

build.gradle Outdated
createFileFromTemplate(gatkCondaTemplate, standardCondaBindings, new File("$buildDir/$gatkCondaYML"))

def intelCondaBindings = [
"condaEnvName":"gatkintel",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want a dash in here? gatk-intel not sure if that makes sense in the context or not, but it's more readable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it reads a little better with the "-". Done.

build.gradle Outdated

doLast() {
def standardCondaBindings = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, map literals, that's nice

build.gradle Outdated
"condaEnvName":"gatkintel",
"condaEnvDescription" : "Conda environment for GATK Python Tools running with Intel hardware acceleration",
"tensorFlowDependency" :
"https://anaconda.org/intel/tensorflow/1.4.0/download/tensorflow-1.4.0-cp36-cp36m-linux_x86_64.whl"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we pull out a tensor flow version constant? it's specified in at a least 3 places

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Done.

build.gradle Outdated
}

// Populate a file based on a template, substituting placeholders with values from a bindings map.
def createFileFromTemplate(sourceTemplateFile, variableBindings, targetFile) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some baked in stuff in the copy action that can do substitutions from a template (ex: http://mrhaki.blogspot.com/2010/10/gradle-goodness-parse-files-with.html).

That might be more canonical for gradle, but I the way you're doing it is probably more understandable. I just want to mention it incase you didn't know about it.

Copy link
Collaborator Author

@cmnbroad cmnbroad May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know you could do that - I think I prefer declarative tasks over functions wherever possible, so I switched to using two copy-with-expand tasks, with one target task with the other two as dependencies. Got rid of this function altogether.

build.gradle Outdated
task condaEnvironmentDefinition() {
dependsOn 'pythonPackageArchive'
inputs.file gatkCondaTemplate
outputs.dir buildDir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it weird to have buildDir as the output? Shouldn't it just be the two output files?

Copy link
Collaborator Author

@cmnbroad cmnbroad May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I removed these input/output declarations altogether since this task is now just a dependency declaration.

@codecov-io
Copy link

codecov-io commented May 22, 2018

Codecov Report

Merging #4749 into master will increase coverage by 0.203%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##              master     #4749       +/-   ##
===============================================
+ Coverage     80.369%   80.572%   +0.203%     
- Complexity     17675     18502      +827     
===============================================
  Files           1088      1090        +2     
  Lines          64006     66472     +2466     
  Branches       10312     10995      +683     
===============================================
+ Hits           51441     53558     +2117     
- Misses          8557      8798      +241     
- Partials        4008      4116      +108
Impacted Files Coverage Δ Complexity Δ
...kers/variantutils/PosteriorProbabilitiesUtils.java 77.966% <0%> (-4.085%) 43% <0%> (-15%)
...otator/dataSources/gencode/GencodeFuncotation.java 53.317% <0%> (-2.545%) 192% <0%> (-2%)
...forms/markduplicates/MarkDuplicatesSparkUtils.java 87.64% <0%> (-1.857%) 69% <0%> (+7%)
...org/broadinstitute/hellbender/utils/MathUtils.java 76.854% <0%> (-1.138%) 192% <0%> (-10%)
...iscoverFromLocalAssemblyContigAlignmentsSpark.java 77.019% <0%> (-0.963%) 3% <0%> (+1%)
...kers/variantutils/CalculateGenotypePosteriors.java 92.308% <0%> (-0.549%) 14% <0%> (-3%)
...tools/walkers/genotyper/AlleleSubsettingUtils.java 91.228% <0%> (-0.507%) 45% <0%> (ø)
...der/tools/walkers/mutect/M2ArgumentCollection.java 93.939% <0%> (-0.505%) 5% <0%> (+2%)
...kers/haplotypecaller/ReferenceConfidenceModel.java 93.194% <0%> (-0.339%) 61% <0%> (-7%)
...hellbender/utils/test/VariantContextTestUtils.java 83.262% <0%> (-0.283%) 64% <0%> (-2%)
... and 49 more

@cmnbroad cmnbroad force-pushed the cn_fix_pip_install branch from 22609eb to f9d2139 Compare May 30, 2018 12:31
Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is much clearer than it was before

README.md Outdated
package archive and conda yml dependency file(s) in the build directory, and also creates (or updates)
the local ```gatk``` conda environment. (To create the ```gatk-intel``` conda environment once the files
have been generated, run the command ```conda env create -f gatkcondaenv.intel.yml```).
* To "activate" the conda environment (this must be done from within the shell from which GATK is run whenever
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence this must be done from within... could be rewritten, it is a little confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thx. I tried to clean it up a little - hopefully its ok now. Squashing and rebase for merge...

@cmnbroad cmnbroad force-pushed the cn_fix_pip_install branch from f9d2139 to 5ddf158 Compare May 31, 2018 21:08
@cmnbroad cmnbroad merged commit 7628cc9 into master Jun 1, 2018
@cmnbroad cmnbroad deleted the cn_fix_pip_install branch June 1, 2018 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conda env create fails with FileNotFoundError
6 participants