-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faith pd #65
Faith pd #65
Conversation
Did a quick pass -- this looks pretty sweet, thank you! Will do a more detailed review tomorrow. No worries about the whitespace. I apologize if I missed it, but I don't think this is hooked up to the command line interface, right? It would be nice to have, but could be deferred if warranted |
Yeah I have not done that yet. I could rig up something similar to |
If you think it would be complex, then it would be appropriate to defer as a later issue. The primary use of the library (to the best of my knowledge) is via the Cython API |
I'm having trouble figuring out why the CI is failing.. The test that failed this time passed the first time and I did not make any changes that should affect how that code runs to the best of my knowledge. I am also unable to recreate the behavior locally. Mind if I add a line |
Hey @gwarmstrong, an empty table is attempted to be created here, however the construction of the table here ensures that there is always a single sample ID. This is most likely driving the traceback in the travis output. |
@wasade The error message we are getting from Travis is here, which is triggered by this test. I do not see a reason that an empty table should trigger this warning. |
What version of |
Thanks for checking. I have tried with 2.1.7 and 2.1.5 both with no issue. |
I am mystified. Discussing with @ElDeveloper. See below, I have it working, and not working in two ways, within the same version of python and version of biom: (qiime2-2018.11) 11:14:37 (dtmcdonald@here):~$ python
Python 3.5.5 | packaged by conda-forge | (default, Apr 6 2018, 13:43:56)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import biom
>>> t = biom.Table([], [], ['foo', 'bar'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/biom/table.py", line 508, in __init__
errcheck(self)
File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/biom/err.py", line 472, in errcheck
raise ret
biom.exception.TableException: Duplicate sample IDs!
>>> t = biom.Table([], [], ['foo'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/biom/table.py", line 508, in __init__
errcheck(self)
File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/biom/err.py", line 472, in errcheck
raise ret
biom.exception.TableException: Duplicate sample IDs!
>>>
(qiime2-2018.11) 11:14:54 (dtmcdonald@here):~$ python
Python 3.5.5 | packaged by conda-forge | (default, Apr 6 2018, 13:43:56)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import biom
^[[At = biom.Table([], [], ['u'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/biom/table.py", line 508, in __init__
errcheck(self)
File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/biom/err.py", line 472, in errcheck
raise ret
biom.exception.TableException: Number of sample IDs differs from matrix size!
>>>
(qiime2-2018.11) 11:15:08 (dtmcdonald@here):~$ python
Python 3.5.5 | packaged by conda-forge | (default, Apr 6 2018, 13:43:56)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import biom
>>> t = biom.Table([], [], ['u'])
>>> Smells like a bug in |
This is a fun one, good spot. Making a fix to biom right now. I advise not specifying axis identifiers or metadata when the table is empty, which makes sense to do too as 0x0 matrix shouldn't have identifiers on either axis :) |
biocore/biom-format#814 is in to resolve this bug. |
Sweet. I also rewrote the test to not specify sample identifiers for the 0x0 matrix. |
@gwarmstrong, would it be possible to create an issue for adding in a CLI interface for this method? And, separately, to create an issue for the quantitative variant? The former is a "nice to have". The latter would trigger a release, and could be awesome to have. |
🚀 |
* added osx build and update requirments * updated compute status messages to reflect changes made in #65 * fixed array size for compute messages * faith pd cli builds, needs test * test file for faith pd cli * precision modification to truth table * added faith pd cli tests * removed unnecessary arguments from signature * rollback travis.yml to master plus new tests * qiime 2019.4 requirements * fixed abberant spacing * seperate wget environemnt and build from file * remove debug info * add header to cli vec * remove header from output to check with test values * addressing comments from @ElDeveloper in #70 * re-removed open_error from write_mat
Feature addition: Faith’s PD (WIP)
NOTE: Use this link to view diff, helpful to avoid viewing the extra whitespace:
https://github.com/biocore/unifrac/compare/master...gwarmstrong:faith_pd?w=1
Or click "hide whitespace changes" on the diff settings
My IDE made some automatic whitespace/newline changes, so there is a lot of extra junk in the diff. If you particularly care that the whitespace was removed, let me know and we can fix it.
I have added functions needed to compute Faith’s PD in C++, as well as exposed an API to python. This implementation improves upon the time and memory overhead for skbio’s faith PD computation on large-scaled datasets. See below:
Usage, if you have a BIOM table saved in
table_file.biom
and a newick formatted tree saved intree_file.nwk
, the following python code will return a pandas.Series of the values for Faith’s PD for each sample in the BIOM table: