Fix critical issue with mergeFeatures #478

Daenarys8 · 2023-11-30T15:27:11Z

Ping #419 @antagomir @TuomasBorman

The default value of rank in mergeFeaturesByRank should be NULL
Add na.rm parameter to mergeFeatures
Change the default to onRankOnly=TRUE in mergeFeaturesByRank
Respect the rowData order

TODO

check other functions if they have similar sorting issues
Add fix on reviews from this PR

antagomir · 2024-03-07T11:44:22Z

up

antagomir · 2024-04-02T16:07:57Z

up - @Daenarys8

R/agglomerate.R

R/merge.R

TuomasBorman · 2024-04-10T11:58:34Z

This does not yet take into account the row order. In mergeRows, the order of data is in alphabetical order. In agglomerateByRanks the order follows the order of rowData. As discussed, the order based on rowData might be the best, but the implementation is harder without big advantages.

So I suggest that we modify agglomerateByRank so that it outputs the data in alphabetical order. I think that is the most straightforward and requires only one additional line in the end of the function.

TuomasBorman

See comments

Daenarys8 · 2024-05-06T13:36:43Z

alphabetical order

by single line, are we referring to?
x <- sort(x)

R/agglomerate.R

TuomasBorman · 2024-05-06T18:18:39Z

To recap:

1

Behavior of functions that are using agglomerateByRank internally is unexpected. Why? Because if user does not provide rank, the data is agglomerated to the highest rank by default. However, that should not be the case. For example, agglomerateByPrevalence should not do rank-agglomeration by default.

I think our approach is now incorrect. Instead of changing the default value of rank in agglomerateByRank to NULL, we should modify agglomerateByPrevalence.

-->

In agglomererateByPrevalence, catch rank parameter from ..., If rank is specified, call agglomerateByRank.

mergeRows and mergeCols drop off those instances that have NA values in grouping variable. That is because grouping vector is converted to factor. To include instances with NA values as "NA group", NA values could be converted to "NA". -->

this behavior should be controlled somehow, na.rm parameter could be perfect for that, but we should check if na.rm is used somewhere else in the function (fed with ... to some other function)

The default value of onRankOnly in agglomerateByRank should be TRUE by default.

In the end of the agglomerateByRank function, order the data alphabetically based on rownames.

TuomasBorman · 2024-05-06T18:19:02Z

Sorry for hassle. I think most of the points are not yet implemented.

R/agglomerate.R

R/merge.R

R/getPrevalence.R

R/estimateDiversity.R

R/merge.R

TuomasBorman · 2024-05-21T05:35:18Z

Can you run examples of Leo that initiated this #419 and print the output to here?

TuomasBorman · 2024-05-21T05:35:56Z

Also bump the versions and add NEWS

Daenarys8 · 2024-05-21T06:49:49Z

To confirm fix, we will recall the example @antagomir used to raise the issue #419 .
Let's first prepare example data.

library(mia)
data(GlobalPatterns, package="mia")
tse <- GlobalPatterns
tse <- transformAssay(tse, assay.type="counts", method="relabundance")

Here we merge features by prevalence, and return the result at the family level.

nrow(mergeFeaturesByPrevalence(tse, rank="Family", assay.type="relabundance", detection  = 0.5/100, prevalence  = 20/100))
[1] 21

Here we merge features first by Family level grouping, then by prevalence.

altExp(tse, "Family") <- mergeFeaturesByRank(tse, rank="Family")
nrow(mergeFeaturesByPrevalence(altExp(tse, "Family"), assay.type="relabundance", detection  = 0.5/100, prevalence  = 20/100))
[1] 21

Same happens when we treat Family as a group, rather thank rank:

altExp(tse, "Family2") <- mergeFeatures(tse, f="Family")
nrow(mergeFeaturesByPrevalence(altExp(tse, "Family2"), assay.type="relabundance", detection  = 0.5/100, prevalence  = 20/100))
[1] 21

Finally, checking the prevalences manually yields the same numbers.

sum(rowMeans(assay(altExp(tse, "Family"), "relabundance") > 0.5/100) > 0.2)
[1] 20

sum(rowMeans(assay(altExp(tse, "Family2"), "relabundance") > 0.5/100) > 0.2)
[1] 20

antagomir · 2024-05-21T07:15:31Z

Thanks!

The last two examples yield a different number (20) - shouldn't it be the same?

Can we have these in unit tests, to make sure it will remain stable also in future releases?

TuomasBorman · 2024-05-21T10:27:51Z

I believe mergeFeaturesByPrevalence() creates a group called others. "others" have all the features under the thresholds. This might explain the behavior.

Can you @Daenarys8 check and create unit tests. They should explain this behavior. For instance if you compare these, you could add comment "other group was removed. It is added by agglomerateByPrevalence function to collect features under threshold." or something like that so that we know in the future what is happening and that it is desired behavior

antagomir · 2024-05-21T10:55:06Z

The code should also show how altExp(tse, "Family") was created. There are different options (onRankOnly etc).

Daenarys8 · 2024-05-22T09:12:28Z

   data(GlobalPatterns, package="mia")
    tse <- GlobalPatterns
    tse <- transformAssay(tse, assay.type="counts", method="relabundance")

Other group not present

    altExp(tse, "Family") <- agglomerateByRank(tse, rank="Family")
    altExp(tse, "Family1") <- agglomerateByRank(tse, rank="Family", onRankOnly = TRUE)
    altExp(tse, "Family2") <- agglomerateByRank(tse, rank="Family", onRankOnly = FALSE)
    altExp(tse, "Family3") <- agglomerateByRank(tse, rank="Family", onRankOnly = TRUE, na.rm = TRUE)
    altExp(tse, "Family4") <- agglomerateByRank(tse, rank="Family", onRankOnly = TRUE, na.rm = FALSE)
    altExp(tse, "Family5") <- agglomerateByVariable(tse, f="Family", MARGIN = 'row')

In the following, other group is added by agglomerateByPrevalence function to collect features under threshold

    actual <- agglomerateByPrevalence(tse, rank="Family", assay.type="relabundance", 
                                         detection  = 0.5/100, prevalence  = 20/100)
    actual0 <- agglomerateByPrevalence(altExp(tse, "Family"), assay.type="relabundance", 
                                       detection  = 0.5/100, prevalence  = 20/100)
    actual1 <- agglomerateByPrevalence(altExp(tse, "Family1"), assay.type="relabundance", 
                                       detection  = 0.5/100, prevalence  = 20/100)
    actual2 <- agglomerateByPrevalence(altExp(tse, "Family2"), assay.type="relabundance", 
                                       detection  = 0.5/100, prevalence  = 20/100)
    actual3 <- agglomerateByPrevalence(altExp(tse, "Family3"), assay.type="relabundance", 
                                       detection  = 0.5/100, prevalence  = 20/100)
    actual4 <- agglomerateByPrevalence(altExp(tse, "Family4"), assay.type="relabundance", 
                                       detection  = 0.5/100, prevalence  = 20/100)
    actual5 <- agglomerateByPrevalence(altExp(tse, "Family5"), assay.type="relabundance", 
                                       detection  = 0.5/100, prevalence  = 20/100)

> nrow(actual)
[1] 21
> nrow(actual1)
[1] 21
> nrow(actual2)
[1] 27

actual2 create groups based on the full taxonomic hierarchy up to family level while the previous creates the factor group with groups at the family level.

> nrow(actual3)
[1] 20
> nrow(actual4)
[1] 21
> nrow(actual5)
[1] 21
>

> sum(rowMeans(assay(altExp(tse, "Family"), "relabundance") > 0.5/100) > 0.2)
[1] 20
> sum(rowMeans(assay(altExp(tse, "Family1"), "relabundance") > 0.5/100) > 0.2)
[1] 20
> sum(rowMeans(assay(altExp(tse, "Family2"), "relabundance") > 0.5/100) > 0.2)
[1] 26
> sum(rowMeans(assay(altExp(tse, "Family3"), "relabundance") > 0.5/100) > 0.2)
[1] 19
> sum(rowMeans(assay(altExp(tse, "Family4"), "relabundance") > 0.5/100) > 0.2)
[1] 20
> sum(rowMeans(assay(altExp(tse, "Family5"), "relabundance") > 0.5/100) > 0.2)
[1] 20
>

tests/testthat/test-3agglomerate.R

Signed-off-by: Daena Rys <[email protected]>

TuomasBorman reviewed Apr 10, 2024

View reviewed changes

R/agglomerate.R Outdated Show resolved Hide resolved

TuomasBorman reviewed Apr 10, 2024

View reviewed changes

R/merge.R Outdated Show resolved Hide resolved

TuomasBorman requested changes Apr 10, 2024

View reviewed changes

Daenarys8 force-pushed the criticalmerge branch from ddefba0 to b410d0a Compare May 6, 2024 11:50

Daenarys8 force-pushed the criticalmerge branch 6 times, most recently from d205d74 to 297aa81 Compare May 6, 2024 16:20

TuomasBorman reviewed May 6, 2024

View reviewed changes

R/agglomerate.R Outdated Show resolved Hide resolved

TuomasBorman reviewed May 8, 2024

View reviewed changes

R/agglomerate.R Show resolved Hide resolved

TuomasBorman reviewed May 8, 2024

View reviewed changes

R/merge.R Outdated Show resolved Hide resolved

TuomasBorman reviewed May 8, 2024

View reviewed changes

R/merge.R Outdated Show resolved Hide resolved

TuomasBorman reviewed May 8, 2024

View reviewed changes

R/getPrevalence.R Show resolved Hide resolved

Daenarys8 force-pushed the criticalmerge branch 3 times, most recently from 789082f to c3f267c Compare May 8, 2024 16:57

Daenarys8 requested a review from TuomasBorman May 8, 2024 17:34

TuomasBorman reviewed May 10, 2024

View reviewed changes

R/estimateDiversity.R Outdated Show resolved Hide resolved

TuomasBorman reviewed May 10, 2024

View reviewed changes

R/merge.R Outdated Show resolved Hide resolved

TuomasBorman reviewed May 10, 2024

View reviewed changes

R/merge.R Outdated Show resolved Hide resolved

TuomasBorman reviewed May 10, 2024

View reviewed changes

R/merge.R Outdated Show resolved Hide resolved

TuomasBorman reviewed May 10, 2024

View reviewed changes

R/merge.R Outdated Show resolved Hide resolved

TuomasBorman reviewed May 10, 2024

View reviewed changes

R/merge.R Outdated Show resolved Hide resolved

TuomasBorman reviewed May 10, 2024

View reviewed changes

R/merge.R Outdated Show resolved Hide resolved

TuomasBorman reviewed May 13, 2024

View reviewed changes

R/merge.R Outdated Show resolved Hide resolved

TuomasBorman reviewed May 13, 2024

View reviewed changes

R/merge.R Outdated Show resolved Hide resolved

Daenarys8 force-pushed the criticalmerge branch 2 times, most recently from c9cc064 to faa51b2 Compare May 21, 2024 02:53

TuomasBorman approved these changes May 21, 2024

View reviewed changes

Daenarys8 force-pushed the criticalmerge branch 2 times, most recently from 996dbc7 to 0493a7f Compare May 21, 2024 06:57

Daenarys8 force-pushed the criticalmerge branch 2 times, most recently from e5b3461 to abc285c Compare May 22, 2024 10:16

TuomasBorman reviewed May 22, 2024

View reviewed changes

tests/testthat/test-3agglomerate.R Outdated Show resolved Hide resolved

TuomasBorman reviewed May 22, 2024

View reviewed changes

tests/testthat/test-3agglomerate.R Show resolved Hide resolved

Daenarys8 added 5 commits May 23, 2024 06:39

Fix critical issue with mergeFeatures

7fcbc7c

Signed-off-by: Daena Rys <[email protected]>

Up

2ea0594

Signed-off-by: Daena Rys <[email protected]>

Up

d1fe5b0

Signed-off-by: Daena Rys <[email protected]>

Up

970d1d3

Signed-off-by: Daena Rys <[email protected]>

Up

7327fb4

Signed-off-by: Daena Rys <[email protected]>

Daenarys8 force-pushed the criticalmerge branch from abc285c to 82e2fa1 Compare May 23, 2024 03:41

Up

d153712

Signed-off-by: Daena Rys <[email protected]>

Daenarys8 force-pushed the criticalmerge branch from 82e2fa1 to d153712 Compare May 23, 2024 03:42

TuomasBorman approved these changes May 23, 2024

View reviewed changes

TuomasBorman merged commit b627edc into microbiome:devel May 23, 2024
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix critical issue with mergeFeatures #478

Fix critical issue with mergeFeatures #478

Daenarys8 commented Nov 30, 2023

antagomir commented Mar 7, 2024

antagomir commented Apr 2, 2024

TuomasBorman commented Apr 10, 2024

TuomasBorman left a comment

Daenarys8 commented May 6, 2024

TuomasBorman commented May 6, 2024

TuomasBorman commented May 6, 2024

TuomasBorman commented May 21, 2024

TuomasBorman commented May 21, 2024

Daenarys8 commented May 21, 2024

antagomir commented May 21, 2024

TuomasBorman commented May 21, 2024

antagomir commented May 21, 2024

Daenarys8 commented May 22, 2024

Fix critical issue with mergeFeatures #478

Fix critical issue with mergeFeatures #478

Conversation

Daenarys8 commented Nov 30, 2023

antagomir commented Mar 7, 2024

antagomir commented Apr 2, 2024

TuomasBorman commented Apr 10, 2024

TuomasBorman left a comment

Choose a reason for hiding this comment

Daenarys8 commented May 6, 2024

TuomasBorman commented May 6, 2024

TuomasBorman commented May 6, 2024

TuomasBorman commented May 21, 2024

TuomasBorman commented May 21, 2024

Daenarys8 commented May 21, 2024

antagomir commented May 21, 2024

TuomasBorman commented May 21, 2024

antagomir commented May 21, 2024

Daenarys8 commented May 22, 2024