-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the conda environment and build. #4749
Conversation
README.md
Outdated
additional information about using and managing Conda environments. | ||
pre-configured. In order to establish an environment suitable to run these tools outside of the Docker image, | ||
do the following: | ||
* Make sure [Conda or MiniConda](https://conda.io/docs/index.html) is installed. Miniconda is sufficient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split out a separate section on conda
in the README, and link to it from this Requirements bullet, instead of having it all inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs to mention the Intel environment. @erniebrau we want to start shipping a conda environment yml for that right ?
@mwalker174 Can you test this branch out? |
@erniebrau @samuelklee @mbabadi @lucidtronix @TedBrookings @mwalker174 With theses changes, the conda yml definition is now generated rather than directly checked in, so after pulling you'll have to re-generate them using the above tasks. Hopefully this workflow is useable, feel free to comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on the readme
README.md
Outdated
* Alternatively, the ```./gradlew condaEnvironmentDefinition``` task can be used to only generate the | ||
archive and yml definition files used to create the conda environment. The conda environment can then | ||
be manually created from within the build directory by executing the command: | ||
```conda env create -n gatk -f gatkcondaenv.yml``` and `activated as described above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra tick mark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
README.md
Outdated
* Run ```./gradlew localDevCondaEnv```. This will autogenerate the Python package archive and conda | ||
yml dependency file(s) in the build directory, and create the local conda environment named "gatk". | ||
* Activate the GATK conda environment by running the command as described above. | ||
* Alternatively, the ```./gradlew condaEnvironmentDefinition``` task can be used to only generate the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to only generate the -> to generate just the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, done.
README.md
Outdated
pre-configured. In order to establish an environment suitable to run these tools outside of the Docker image, | ||
do the following: | ||
* Make sure [Conda or MiniConda](https://conda.io/docs/index.html) is installed. Miniconda is sufficient. | ||
* If running from a zip or tar distribution: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having three sets of instructions, perhaps separate these out into a section titled something like "first generate the conda environment files" with the three subsections, then another bullet with source activate gatk
, as its a little confusing to have to refer above when following the instructions, where above do you mean? etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its getting to be super confusing, especially since I had to add something that mentions the intel yml environment. Done.
README.md
Outdated
* If running from a zip or tar distribution: | ||
* Run the command ```conda env create -f gatkcondaenv.yml``` to create the conda environment | ||
required by GATK. | ||
* The conda environment must be activated from within the shell from which GATK is run whenever tools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to its own bullet point and clarify this sentence as its a little confusing to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
scripts/gatkcondaenv.yml.template
Outdated
@@ -44,8 +44,8 @@ dependencies: | |||
- scipy==1.0.0 | |||
- six==1.11.0 | |||
- tensorflow==1.4.0 | |||
- tensorflow-tensorboard==0.4.0rc3 | |||
- $tensorFlowDependency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm - this needs to be fixed - this line should replace the tensorflow dependency, not the the tensorboard dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmnbroad a few comments, nothing very serious, seems fine. 👍 when you've addressed the various comments, particularly your own comment about the $tensorFlowDependency
build.gradle
Outdated
createFileFromTemplate(gatkCondaTemplate, standardCondaBindings, new File("$buildDir/$gatkCondaYML")) | ||
|
||
def intelCondaBindings = [ | ||
"condaEnvName":"gatkintel", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want a dash in here? gatk-intel
not sure if that makes sense in the context or not, but it's more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it reads a little better with the "-". Done.
build.gradle
Outdated
|
||
doLast() { | ||
def standardCondaBindings = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh, map literals, that's nice
build.gradle
Outdated
"condaEnvName":"gatkintel", | ||
"condaEnvDescription" : "Conda environment for GATK Python Tools running with Intel hardware acceleration", | ||
"tensorFlowDependency" : | ||
"https://anaconda.org/intel/tensorflow/1.4.0/download/tensorflow-1.4.0-cp36-cp36m-linux_x86_64.whl"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we pull out a tensor flow version constant? it's specified in at a least 3 places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Done.
build.gradle
Outdated
} | ||
|
||
// Populate a file based on a template, substituting placeholders with values from a bindings map. | ||
def createFileFromTemplate(sourceTemplateFile, variableBindings, targetFile) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some baked in stuff in the copy action that can do substitutions from a template (ex: http://mrhaki.blogspot.com/2010/10/gradle-goodness-parse-files-with.html).
That might be more canonical for gradle, but I the way you're doing it is probably more understandable. I just want to mention it incase you didn't know about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know you could do that - I think I prefer declarative tasks over functions wherever possible, so I switched to using two copy-with-expand
tasks, with one target task with the other two as dependencies. Got rid of this function altogether.
build.gradle
Outdated
task condaEnvironmentDefinition() { | ||
dependsOn 'pythonPackageArchive' | ||
inputs.file gatkCondaTemplate | ||
outputs.dir buildDir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it weird to have buildDir as the output? Shouldn't it just be the two output files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I removed these input/output declarations altogether since this task is now just a dependency declaration.
00e9111
to
22609eb
Compare
Codecov Report
@@ Coverage Diff @@
## master #4749 +/- ##
===============================================
+ Coverage 80.369% 80.572% +0.203%
- Complexity 17675 18502 +827
===============================================
Files 1088 1090 +2
Lines 64006 66472 +2466
Branches 10312 10995 +683
===============================================
+ Hits 51441 53558 +2117
- Misses 8557 8798 +241
- Partials 4008 4116 +108
|
22609eb
to
f9d2139
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is much clearer than it was before
README.md
Outdated
package archive and conda yml dependency file(s) in the build directory, and also creates (or updates) | ||
the local ```gatk``` conda environment. (To create the ```gatk-intel``` conda environment once the files | ||
have been generated, run the command ```conda env create -f gatkcondaenv.intel.yml```). | ||
* To "activate" the conda environment (this must be done from within the shell from which GATK is run whenever |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence this must be done from within...
could be rewritten, it is a little confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thx. I tried to clean it up a little - hopefully its ok now. Squashing and rebase for merge...
f9d2139
to
5ddf158
Compare
Several changes:
gatkcondaenv.intel.yml
) for environments that use Intel hardware acceleration and Intel Tensorflow package (based on Improvements to the prep of the input tensors to CNNScoreVariants (2D) #4735).condaEnvironmentDefinition
) to generate the conda yml files from a single template to ensure that all the environment definitions remain in sync. This task also generates the Python package archive.localDevCondaEnv
) to create or update a local (non-Intel) conda environment. This is a shortcut for use during development when you're iteratively changing/testing Python code and want to update the conda env.createPythonPackageArchive
task, which is now calledpythonPackageArchive
.