Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faith pd #65

Merged
merged 15 commits into from
May 11, 2019
Merged

Faith pd #65

merged 15 commits into from
May 11, 2019

Conversation

gwarmstrong
Copy link
Member

@gwarmstrong gwarmstrong commented May 6, 2019

Feature addition: Faith’s PD (WIP)

NOTE: Use this link to view diff, helpful to avoid viewing the extra whitespace:
https://github.com/biocore/unifrac/compare/master...gwarmstrong:faith_pd?w=1
Or click "hide whitespace changes" on the diff settings
My IDE made some automatic whitespace/newline changes, so there is a lot of extra junk in the diff. If you particularly care that the whitespace was removed, let me know and we can fix it.

I have added functions needed to compute Faith’s PD in C++, as well as exposed an API to python. This implementation improves upon the time and memory overhead for skbio’s faith PD computation on large-scaled datasets. See below:
unknown
unknown

Usage, if you have a BIOM table saved in table_file.biom and a newick formatted tree saved in tree_file.nwk, the following python code will return a pandas.Series of the values for Faith’s PD for each sample in the BIOM table:

from unifrac import stacked_faith
stacked_faith(‘table_file.biom’, ‘tree_file.nwk’)

@wasade
Copy link
Member

wasade commented May 6, 2019

Did a quick pass -- this looks pretty sweet, thank you! Will do a more detailed review tomorrow. No worries about the whitespace. I apologize if I missed it, but I don't think this is hooked up to the command line interface, right? It would be nice to have, but could be deferred if warranted

@gwarmstrong
Copy link
Member Author

Yeah I have not done that yet. I could rig up something similar to sucpp/su.cpp for Faith's.

@wasade
Copy link
Member

wasade commented May 6, 2019

If you think it would be complex, then it would be appropriate to defer as a later issue. The primary use of the library (to the best of my knowledge) is via the Cython API

@gwarmstrong
Copy link
Member Author

I'm having trouble figuring out why the CI is failing.. The test that failed this time passed the first time and I did not make any changes that should affect how that code runs to the best of my knowledge. I am also unable to recreate the behavior locally. Mind if I add a line conda list to the travis.yml to get a better picture of what conda is doing?

@wasade
Copy link
Member

wasade commented May 7, 2019

Hey @gwarmstrong, an empty table is attempted to be created here, however the construction of the table here ensures that there is always a single sample ID. This is most likely driving the traceback in the travis output.

@gwarmstrong
Copy link
Member Author

@wasade The error message we are getting from Travis is here, which is triggered by this test. I do not see a reason that an empty table should trigger this warning.
Additionally, if I try to make a Table locally from empty numpy arrays, it works:
Snip20190507_2
Maybe this could be a matter of how the intermediate testing files are being saved?

@wasade
Copy link
Member

wasade commented May 7, 2019

What version of biom do you have (biom.__version__)? Just checked with 2.1.6 and 2.1.7, and get duplicate sample ID errors with both...

@gwarmstrong
Copy link
Member Author

Thanks for checking. I have tried with 2.1.7 and 2.1.5 both with no issue.

@wasade
Copy link
Member

wasade commented May 7, 2019

I am mystified. Discussing with @ElDeveloper. See below, I have it working, and not working in two ways, within the same version of python and version of biom:

(qiime2-2018.11) 11:14:37 (dtmcdonald@here):~$ python
Python 3.5.5 | packaged by conda-forge | (default, Apr  6 2018, 13:43:56)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import biom
>>> t = biom.Table([], [], ['foo', 'bar'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/biom/table.py", line 508, in __init__
    errcheck(self)
  File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/biom/err.py", line 472, in errcheck
    raise ret
biom.exception.TableException: Duplicate sample IDs!
>>> t = biom.Table([], [], ['foo'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/biom/table.py", line 508, in __init__
    errcheck(self)
  File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/biom/err.py", line 472, in errcheck
    raise ret
biom.exception.TableException: Duplicate sample IDs!
>>>
(qiime2-2018.11) 11:14:54 (dtmcdonald@here):~$ python
Python 3.5.5 | packaged by conda-forge | (default, Apr  6 2018, 13:43:56)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import biom
^[[At = biom.Table([], [], ['u'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/biom/table.py", line 508, in __init__
    errcheck(self)
  File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/biom/err.py", line 472, in errcheck
    raise ret
biom.exception.TableException: Number of sample IDs differs from matrix size!
>>>
(qiime2-2018.11) 11:15:08 (dtmcdonald@here):~$ python
Python 3.5.5 | packaged by conda-forge | (default, Apr  6 2018, 13:43:56)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import biom
>>> t = biom.Table([], [], ['u'])
>>>

Smells like a bug in biom. More soon.

@wasade
Copy link
Member

wasade commented May 7, 2019

This is a fun one, good spot. Making a fix to biom right now. I advise not specifying axis identifiers or metadata when the table is empty, which makes sense to do too as 0x0 matrix shouldn't have identifiers on either axis :)

@wasade
Copy link
Member

wasade commented May 7, 2019

biocore/biom-format#814 is in to resolve this bug.

@gwarmstrong
Copy link
Member Author

Sweet. I also rewrote the test to not specify sample identifiers for the 0x0 matrix.

@wasade
Copy link
Member

wasade commented May 11, 2019

@gwarmstrong, would it be possible to create an issue for adding in a CLI interface for this method? And, separately, to create an issue for the quantitative variant? The former is a "nice to have". The latter would trigger a release, and could be awesome to have.

@wasade wasade merged commit 3045e10 into biocore:master May 11, 2019
@ElDeveloper
Copy link
Member

:shipit: 🚀

gwarmstrong added a commit to gwarmstrong/unifrac that referenced this pull request May 15, 2019
ElDeveloper pushed a commit that referenced this pull request Jun 19, 2019
* added osx build and update requirments

* updated compute status messages to reflect changes made in #65

* fixed array size for compute messages

* faith pd cli builds, needs test

* test file for faith pd cli

* precision modification to truth table

* added faith pd cli tests

* removed unnecessary arguments from signature

* rollback travis.yml to master plus new tests

* qiime 2019.4 requirements

* fixed abberant spacing

* seperate wget environemnt and build from file

* remove debug info

* add header to cli vec

* remove header from output to check with test values

* addressing comments from @ElDeveloper in #70

* re-removed open_error from write_mat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants