Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wrapper for agglomerateByRank/mergeRows #389

Merged
merged 6 commits into from
Jul 25, 2023

Conversation

Daenarys8
Copy link
Collaborator

There are 2 different methods doing similar (merging/grouping rows/features);

  • mergeRows and agglomerateByRank
    This wrapper method that combines these two methods.
  • It would be used internally by other functions so it would not be available directly for user.

Copy link
Contributor

@TuomasBorman TuomasBorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

Add couple of checks: .merge_features vs agglomerateByRank and .merge_features vs mergeRows

R/utils.R Outdated Show resolved Hide resolved
Signed-off-by: Daenarys8 <[email protected]>
Copy link
Contributor

@TuomasBorman TuomasBorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small thhings, looks good!

R/utils.R Outdated Show resolved Hide resolved
R/utils.R Outdated Show resolved Hide resolved
R/utils.R Outdated Show resolved Hide resolved
tests/testthat/test-2merge.R Outdated Show resolved Hide resolved
tests/testthat/test-3agglomerate.R Show resolved Hide resolved
Signed-off-by: Daenarys8 <[email protected]>
Signed-off-by: Daenarys8 <[email protected]>
Copy link
Contributor

@TuomasBorman TuomasBorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice

@antagomir
Copy link
Member

Status of this?

@Daenarys8
Copy link
Collaborator Author

Status of this?

Complete for now.

@antagomir
Copy link
Member

antagomir commented Jul 19, 2023

Great!

We had elsewhere discussion about function naming.

I would suggest to update as follows:

Downside of "merge" term is that it is also used in another meaning, to combine full SE objects, data.frames, matrices etc (instead of within object features as here). Another option would be: agglomerateFeatures and agglomerateFeaturesByRank but that is slower to write.. or combineFeature / combineFeaturesByRank (but combine is also used in the same way than merge, between distinct DataFrames)

Shall we deal with these issues here in this closely related PR, or should we open a new separate issue & PR?

@TuomasBorman
Copy link
Contributor

Great!

We had elsewhere discussion about function naming.

I would suggest to update as follows:

* mergeRows -> mergeFeatures (see issue [Deprecate mergeRows/mergeCols #392](https://github.com/microbiome/mia/issues/392))

* agglomerateByRank -> mergeFeaturesByRank (logical to use similar naming in both functions)

Downside of "merge" term is that it is also used in another meaning, to combine full SE objects, data.frames, matrices etc (instead of within object features as here). Another option would be: agglomerateFeatures and agglomerateFeaturesByRank but that is slower to write.. or combineFeature / combineFeaturesByRank (but combine is also used in the same way than merge, between distinct DataFrames)

Shall we deal with these issues here in this closely related PR, or should we open a new separate issue & PR?

I also think mergeFeatures and mergeFeaturesByRank are better than agglomerate*, and it is good to have same naming convention.

We discussed also about enabling rowData variables in mergeRows. Currently, the groups must be fed as a vector (f parameter) but I think it would be nice to be able to specify the grouping variable directly from rowData as a column name. (Same for mergeCols)

There must be some reason why we didn't implement this before??? Do you remember @antagomir?

I think this PR can be merged, and these new modifications can be done in different PR. @Daenarys8 can you implement these?

@antagomir
Copy link
Member

I agree. I don't think there is a specific reason, the rowData variables could be called by name as well.

If @Daenarys8 can open a new issue and PR about that it would be great.

You can close this issue when you have checked that it is ready.

@Daenarys8
Copy link
Collaborator Author

Great!
We had elsewhere discussion about function naming.
I would suggest to update as follows:

* mergeRows -> mergeFeatures (see issue [Deprecate mergeRows/mergeCols #392](https://github.com/microbiome/mia/issues/392))

* agglomerateByRank -> mergeFeaturesByRank (logical to use similar naming in both functions)

Downside of "merge" term is that it is also used in another meaning, to combine full SE objects, data.frames, matrices etc (instead of within object features as here). Another option would be: agglomerateFeatures and agglomerateFeaturesByRank but that is slower to write.. or combineFeature / combineFeaturesByRank (but combine is also used in the same way than merge, between distinct DataFrames)
Shall we deal with these issues here in this closely related PR, or should we open a new separate issue & PR?

I also think mergeFeatures and mergeFeaturesByRank are better than agglomerate*, and it is good to have same naming convention.

We discussed also about enabling rowData variables in mergeRows. Currently, the groups must be fed as a vector (f parameter) but I think it would be nice to be able to specify the grouping variable directly from rowData as a column name. (Same for mergeCols)

There must be some reason why we didn't implement this before??? Do you remember @antagomir?

I think this PR can be merged, and these new modifications can be done in different PR. @Daenarys8 can you implement these?

Sure, will give it a shot

@antagomir
Copy link
Member

Can someone close this PR when it is confirmed to be ready.

@Daenarys8 Daenarys8 closed this Jul 24, 2023
@antagomir
Copy link
Member

Was this ever merged? If I check correctly from above, it was just closed without merging ? The idea was to merge I guess?

@antagomir antagomir reopened this Jul 24, 2023
@antagomir
Copy link
Member

antagomir commented Jul 24, 2023

Hmm - - ok now I noticed that the renaming scheme discussed above, and agreed, has not made it to this PR yet.

Can we add it (@Daenarys8) ?

More specifically, the idea was to rename "Rows" to "Features" and "Cols" to "Samples".

In addition, to harmonize terminology (use "merge" instead of "agglomerate").

However, one last thing to discuss first: the meaning of "to agglomerate" is somewhat better fit with our case (in terms of language & meaning), in particular when the phylogenetic tree is involved in the process. The phyloseq equivalent to TreeSE agglomerateByRank is tax_glom (e.g. https://mikemc.github.io/speedyseq/reference/tax_glom.html).

We could use "glom" instead of "merge". That would be as fast to write, and the meaning would be more specific (merge is easier to confuse with merging of matrices). However, "glom" might be a bit weird term to introduce right now, and it is possible to re-evaluate that later as well.

Hence, in summary, I suggest to rename as:

  • mergeCols -> mergeSamples (or glomSamples... perhaps not?)
  • mergeRows -> mergeFeatures
  • agglomerateByRank -> mergeFeaturesByRank
  • agglomerateByPrevalence -> mergeFeaturesByPrevalence

@Daenarys8 could you add that to this PR as discussed above, OR open a new issue proposing this change, then merging and closing the current PR. Also confirm that the other points discussed / agreed above have now been addessred.

@antagomir
Copy link
Member

Ok it is more clear to do the other discussed changes in a separate PR. I am merging and closing this one.

@antagomir antagomir merged commit 8685bda into microbiome:master Jul 25, 2023
This was referenced Jul 25, 2023
@antagomir
Copy link
Member

Ok - now as this is merged:

the original motivation was ANCOMBC issue #174 where we would like to let users specify taxonomic rank or also other rowData variable to merge rows before running ANCOMBC: FrederickHuangLin/ANCOMBC#174

Now, ideally this new wrapper will help there; it would group by taxonomic rank if this is available in rowData, otherwise it uses the more general grouping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants