-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix critical issue with mergeFeatures #478
Conversation
up |
up - @Daenarys8 |
This does not yet take into account the row order. In mergeRows, the order of data is in alphabetical order. In agglomerateByRanks the order follows the order of rowData. As discussed, the order based on rowData might be the best, but the implementation is harder without big advantages. So I suggest that we modify agglomerateByRank so that it outputs the data in alphabetical order. I think that is the most straightforward and requires only one additional line in the end of the function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments
by single line, are we referring to? |
d205d74
to
297aa81
Compare
To recap: 1 Behavior of functions that are using agglomerateByRank internally is unexpected. Why? Because if user does not provide rank, the data is agglomerated to the highest rank by default. However, that should not be the case. For example, agglomerateByPrevalence should not do rank-agglomeration by default. I think our approach is now incorrect. Instead of changing the default value of rank in agglomerateByRank to NULL, we should modify agglomerateByPrevalence. --> In agglomererateByPrevalence, catch rank parameter from mergeRows and mergeCols drop off those instances that have NA values in grouping variable. That is because grouping vector is converted to factor. To include instances with NA values as "NA group", NA values could be converted to "NA". --> this behavior should be controlled somehow, The default value of onRankOnly in agglomerateByRank should be TRUE by default. In the end of the agglomerateByRank function, order the data alphabetically based on rownames. |
Sorry for hassle. I think most of the points are not yet implemented. |
789082f
to
c3f267c
Compare
c9cc064
to
faa51b2
Compare
Can you run examples of Leo that initiated this #419 and print the output to here? |
Also bump the versions and add NEWS |
To confirm fix, we will recall the example @antagomir used to raise the issue #419 .
Here we merge features by prevalence, and return the result at the family level.
Here we merge features first by Family level grouping, then by prevalence.
Same happens when we treat Family as a group, rather thank rank:
Finally, checking the prevalences manually yields the same numbers.
|
996dbc7
to
0493a7f
Compare
Thanks! The last two examples yield a different number (20) - shouldn't it be the same? Can we have these in unit tests, to make sure it will remain stable also in future releases? |
I believe mergeFeaturesByPrevalence() creates a group called others. "others" have all the features under the thresholds. This might explain the behavior. Can you @Daenarys8 check and create unit tests. They should explain this behavior. For instance if you compare these, you could add comment "other group was removed. It is added by agglomerateByPrevalence function to collect features under threshold." or something like that so that we know in the future what is happening and that it is desired behavior |
The code should also show how altExp(tse, "Family") was created. There are different options (onRankOnly etc). |
Other group not present
In the following, other group is added by agglomerateByPrevalence function to collect features under threshold
actual2 create groups based on the full taxonomic hierarchy up to family level while the previous creates the factor group with groups at the family level.
|
e5b3461
to
abc285c
Compare
Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Ping #419 @antagomir @TuomasBorman
TODO