Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update: sepp (pplacer binary) #52398

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Update: sepp (pplacer binary) #52398

wants to merge 15 commits into from

Conversation

sjanssen2
Copy link
Contributor

@sjanssen2 sjanssen2 commented Nov 27, 2024

Work in progress, don't merge!

SEPP depends on pplacer and guppy, which are maintained here https://github.com/matsen/pplacer and written in ocaml. There doesn't seem to be a mechanism in place to compile from source. However, matsen does provide some pre-compiled binaries for Linux and OSX, i.e. version 1.1.alpha-17: https://github.com/matsen/pplacer/releases/tag/v1.1.alpha17 (Sep 23, 2015)

Currently, SEPP bundles older (v1.1.alpha13-0-g1ec7786, 2012-05-21) pre-compiled binaries of pplacer and guppy. On a current Debian system, these binaries through segementation faults, even without any inputs. Therefore, I try to make pplacer available through bioconda (which is lacking an osx package) here: #52395

Once merged, I will change the SEPP recipe in a way that it no longer uses the bundled pplacer and guppy binaries but those available as conda dependency.

Will this change the results? Probably yes, the question is to what degree. Here is a comparison for a 2479 inserted 16S sequence set into the GreenGenes 13.8 reference tree:

  1. all sequences are inserted in current and updated version
  2. assessing potential differences in the placement positions isn't trivial as a) not all nodes in the reference tree are names, i.e. there is no 1 to 1 mapping possible b) multiple sequences can be inserted into same clades, which will change their topology. I therefore use two metrics:
    1. "tip length": The distance from the direct parent node towards the inserted sequence
    2. "reference distance": first, traversing ancestors towards the root until a node with a numeric name (these are the GG reference tips) is found, then measuring the distance between inserted sequence and this reference node

Of the 2479 sequence

  1. 2256 (=91%) get placed at identical positions.
  2. maximal "tip length" difference is 0.0645883, maximal "reference distance" is 0.04480039523. For reference, this is about the expected distance of taxa within the same species in the GG tree:
    image
  3. only 32 (=1%) of sequences get placed at positions that differ more than one tenth of the species radius (~0.05), i.e. 0.005 in the smaller of both above metrics.

Please read the guidelines for Bioconda recipes before opening a pull request (PR).

General instructions

  • If this PR adds or updates a recipe, use "Add" or "Update" appropriately as the first word in its title.
  • New recipes not directly relevant to the biological sciences need to be submitted to the conda-forge channel instead of Bioconda.
  • PRs require reviews prior to being merged. Once your PR is passing tests and ready to be merged, please issue the @BiocondaBot please add label command.
  • Please post questions on Gitter or ping @bioconda/core in a comment.

Instructions for avoiding API, ABI, and CLI breakage issues

Conda is able to record and lock (a.k.a. pin) dependency versions used at build time of other recipes.
This way, one can avoid that expectations of a downstream recipe with regards to API, ABI, or CLI are violated by later changes in the recipe.
If not already present in the meta.yaml, make sure to specify run_exports (see here for the rationale and comprehensive explanation).
Add a run_exports section like this:

build:
  run_exports:
    - ...

with ... being one of:

Case run_exports statement
semantic versioning {{ pin_subpackage("myrecipe", max_pin="x") }}
semantic versioning (0.x.x) {{ pin_subpackage("myrecipe", max_pin="x.x") }}
known breakage in minor versions {{ pin_subpackage("myrecipe", max_pin="x.x") }} (in such a case, please add a note that shortly mentions your evidence for that)
known breakage in patch versions {{ pin_subpackage("myrecipe", max_pin="x.x.x") }} (in such a case, please add a note that shortly mentions your evidence for that)
calendar versioning {{ pin_subpackage("myrecipe", max_pin=None) }}

while replacing "myrecipe" with either name if a name|lower variable is defined in your recipe or with the lowercase name of the package in quotes.

Bot commands for PR management

Please use the following BiocondaBot commands:

Everyone has access to the following BiocondaBot commands, which can be given in a comment:

@BiocondaBot please update Merge the master branch into a PR.
@BiocondaBot please add label Add the please review & merge label.
@BiocondaBot please fetch artifacts Post links to CI-built packages/containers.
You can use this to test packages locally.

Note that the @BiocondaBot please merge command is now depreciated. Please just squash and merge instead.

Also, the bot watches for comments from non-members that include @bioconda/<team> and will automatically re-post them to notify the addressed <team>.

@sjanssen2 sjanssen2 closed this Nov 28, 2024
@sjanssen2 sjanssen2 reopened this Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant