Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rworkflows #48

Closed
wants to merge 16 commits into from
Closed

Add rworkflows #48

wants to merge 16 commits into from

Conversation

bschilder
Copy link

I just added rworkflows to Rsamtools to have some immediate checks on your code (across 3 platforms) via GitHub Actions

There's still a couple of remaining Warnings and Notes from the previous version that i wasn't sure how to resolve. But since these were occurring in Rsamtools before I made my changes, I think it's safe to merge the PR and then sort out the source of these issues.

@vobencha @hpages @mtmorgan

checking compiled code ... WARNING
  FileRsamtools/libs/Rsamtools.so:
    Found___stderrp’, possibly fromstderr’ (C)
      Objects:bam_sort.o’, ‘bamfile.o’, ‘bedidx.o’, ‘sam_opts.o’,
        ‘sam_utils.o’, ‘samtools_patch.oFound___stdoutp’, possibly fromstdout’ (C)
      Objects:bam_sort.o’, ‘sam_utils.oFound_abort’, possibly fromabort’ (C)
      Object:bam_sort.oFileRsamtools/libs/Rsamtools.so:
    Found no call to:R_useDynamicSymbolsCompiled code should not call entry points which might terminate R nor
  write to stdout/stderr instead of to the console, nor use Fortran I/O
  nor system RNGs.
  It is good practice to register native routines and to disable symbol
  search.
  
  SeeWriting portable packagesin theWriting R Extensionsmanual.checking installed package size ... NOTE
    installed size is  8.2Mb
    sub-directories of 1Mb or more:
      R         3.0Mb
      extdata   2.7Mb
      libs      1.3Mbchecking dependencies in R code ... NOTE
  Unexported objects imported by ':::' calls:S4Vectors:::explodeIntBits’ ‘S4Vectors:::implodeIntBits’
    ‘S4Vectors:::makePowersOfTwo’ ‘S4Vectors:::quick_unlist’
    ‘S4Vectors:::selectSomeSee the note in ?`:::` about the use of this operator.checking foreign function calls ... NOTE
  Registration problems:
    symbolfuncnot in namespace:
     .Call(func, .extptr(file), regions, flag, simpleCigar, tagFilter, 
         mapqFilter, ...)
    symbolfuncnot in namespace:
     .Call(func, .extptr(file), regions, tmpl)
    symbolfuncnot in namespace:
     .Call(func, file, dest)
  See chapterSystem and foreign language interfacesin theWriting R
  Extensionsmanual.checking for GNU extensions in Makefiles ... NOTE
  GNU make is a SystemRequirements.

Thanks, and let me know if you have any questions!

Best,
Brian

@mtmorgan
Copy link
Collaborator

mtmorgan commented Dec 23, 2022

Hi @bschilder thanks for your work on this.

I don't think Rproj files belong in git -- these are user preferences rather than a part of the package; after all I don't include .emacs ... Also for instance if I read this correctly something like NumSpacesForTab: 2 is not consistent with the 4-space indentation in the package (and more generally bioconductor core?) programming style, the package doesn't use devtools paradigms, etc. So I'm not sure that this is adding value.

I'm concerned about the support costs of rworkflows, which I am not familiar with and which would fall on maintainer shoulders if your interests were to migrate elsewhere (the first commit to Rsamtools in 2009...).

I'm also not confident that the workflow is 'doing the right thing' and will continue to do the right thing, with respect to versioning. For instance the Linux distribution appears to use R-devel and bioconductor/bioconductor_docker:devel, which is correct for the current and next Bioconductor release cycle, but will be incorrect for the cycle after that (bioconductor has a twice yearly release cycle whereas R has an annual cycle; bioconductor devel anticipates the user environment when it becomes release, so bioc-devel only builds on R-devel 1/2 the time). Likewise macOS and Windows seem to be using R 'latest' but bioc-devel, which is not how Bioconductor distributes packages (or perhaps 'latest' is a synonym for R-devel, but then this just highlights the implicit support costs...).

I see also 'as_cran' but Bioconductor does not check packages that way, using instead the R environment defined at http://bioconductor.org/checkResults/release/bioc-LATEST/Renviron.bioc.

The issue

❯ checking foreign function calls ... NOTE
  Registration problems:
    symbol ‘func’ not in namespace:
     .Call(func, .extptr(file), regions, flag, simpleCigar, tagFilter, 
         mapqFilter, ...)

seems to be incorrect -- the pattern is foo <- function(func, ...) ... .Call(func, ...). The other issues were already known and / or rationalized from the Bioconductor build report.

@bschilder
Copy link
Author

bschilder commented Dec 23, 2022

Thanks for all the thoughtful comments @mtmorgan !

Hi @bschilder thanks for your work on this.

I don't think Rproj files belong in git -- these are user preferences rather than a part of the package; after all I don't include .emacs ... Also for instance if I read this correctly something like NumSpacesForTab: 2 is not consistent with the 4-space indentation in the package (and more generally bioconductor core?) programming style, the package doesn't use devtools paradigms, etc. So I'm not sure that this is adding value.

I agree, this was added by mistake since I assumed you already had those file types added to your .gitignore. I can fix this.

I'm concerned about the support costs of rworkflows, which I am not familiar with and which would fall on maintainer shoulders if your interests were to migrate elsewhere (the first commit to Rsamtools in 2009...).

True, but that's kind of the point of rworkflows. Rather than having a static workflow that each user must maintain, it calls a centralized action that our Neurogenomics Lab maintains. To me, this is even less risky than having a Suggest dependency (with a lot of benefits), as it does not affect whether your package gets distributed to Bioc, it is simply an intermediate check for developers to get immediate feedback on whether they're code is working, rather than waiting several days to see whether things have worked on Bioc (and at that point, any bugs are now being distributed to Bioc users).

I do understand the fear, but the same is true for any package or software that is maintained on a volunteer basis, including Rsamtools itself!

If it's any consolation, I currently maintain >25 R packages that all depend on rworkflows. So I have pretty good incentive to keep maintaining it ;) Plus, the goal is to grow the popularity of rworkflows such that it benefits from a lot of community involvement.

I'm also not confident that the workflow is 'doing the right thing' and will continue to do the right thing, with respect to versioning. For instance the Linux distribution appears to use R-devel and bioconductor/bioconductor_docker:devel, which is correct for the current and next Bioconductor release cycle, but will be incorrect for the cycle after that (bioconductor has a twice yearly release cycle whereas R has an annual cycle; bioconductor devel anticipates the user environment when it becomes release, so bioc-devel only builds on R-devel 1/2 the time). Likewise macOS and Windows seem to be using R 'latest' but bioc-devel, which is not how Bioconductor distributes packages.

I'm open to suggestions regarding the 'right thing' to do!

As a developer of a handful of Bioc packages myself, I'm very much interested in making rworkflows fully compatible with the Bioc schedule.

My logic with having Linux use the devel version was that it would allow developers to be ahead of the curve, whereas Mac/Windows reflects the latest release.

These are of course all things you have full control over within your yaml file. I've done this step for you, but the R function rworkflows::use_workflow() has a bunch of arguments you can adjust to generate the workflow script with whatever your preferences are. That said, I'd like to programmatically extract the 'correct' versions for R and Bioc such that they match the Bioc release schedule, so that users never have to worry about that. I've added it as an Issue here:
neurogenomics/rworkflows#33

I see also 'as_cran' but Bioconductor does not check packages that way, using instead the R environment defined at http://bioconductor.org/checkResults/release/bioc-LATEST/Renviron.bioc.

You're welcome to change any of the rworkflows parameters as you see fit. To change thing, simply set as_cran: {{ false }} in the yaml file. I'd also be happy to change this for you.

The issue

❯ checking foreign function calls ... NOTE
  Registration problems:
    symbol ‘func’ not in namespace:
     .Call(func, .extptr(file), regions, flag, simpleCigar, tagFilter, 
         mapqFilter, ...)

seems to be incorrect -- the pattern is foo <- function(func, ...) ... .Call(func, ...). The other issues were already known and / or rationalized from the Bioconductor build report.

I see, perhaps there's a setting in rcmdcheck I can adjust. Still, it's just a Note so it shouldn't affect the ability to rworkflows to pass.

My larger concern were the Warning messages. I'm not familiar with C code myself, but perhaps you'll have some insights into this.

checking compiled code ... WARNING
  FileRsamtools/libs/Rsamtools.so:
    Found___stderrp’, possibly fromstderr’ (C)
      Objects:bam_sort.o’, ‘bamfile.o’, ‘bedidx.o’, ‘sam_opts.o’,
        ‘sam_utils.o’, ‘samtools_patch.oFound___stdoutp’, possibly fromstdout’ (C)
      Objects:bam_sort.o’, ‘sam_utils.oFound_abort’, possibly fromabort’ (C)
      Object:bam_sort.oFileRsamtools/libs/Rsamtools.so:
    Found no call to:R_useDynamicSymbols

Thanks again,
Brian

@bschilder
Copy link
Author

Hi @mtmorgan, I've gone through and implemented many of your helpful suggestions.

  1. Added "*.Rproj" to the .gitignore (note the wildcard).
  2. Synchronise Bioc / R versions: Synchronise R/Bioc versions neurogenomics/rworkflows#33
  3. Set as_cran to false in the rworkflows workflow file within GenomicRanges.

I hope this helps alleviate at least some of the main concerns! Let me know if there's anything else I can do to help.

@bschilder
Copy link
Author

bschilder commented Jan 5, 2023

rcmdcheck

Regarding these warnings, trying to find which parameters i need to change so that these are ignored (if you can confirm they are spurious warnings):

checking compiled code ... WARNING
  FileRsamtools/libs/Rsamtools.so:
    Found___stderrp’, possibly fromstderr’ (C)
      Objects:bam_sort.o’, ‘bamfile.o’, ‘bedidx.o’, ‘sam_opts.o’,
        ‘sam_utils.o’, ‘samtools_patch.oFound___stdoutp’, possibly fromstdout’ (C)
      Objects:bam_sort.o’, ‘sam_utils.oFound_abort’, possibly fromabort’ (C)
      Object:bam_sort.oFileRsamtools/libs/Rsamtools.so:
    Found no call to:R_useDynamicSymbolsCompiled code should not call entry points which might terminate R nor
   write to stdout/stderr instead of to the console, nor use Fortran I/O
   nor system RNGs.
   It is good practice to register native routines and to disable symbol
   search.
   
   SeeWriting portable packagesin theWriting R Extensionsmanual.

Potentially related Issues:
airoldilab/sgd#44
ropensci/git2r#83
microsoft/LightGBM#1909

I tried these suggested fixes but tools::package_native_routine_registration_skeleton(".") simply yielded a message "no native symbols were extracted". And it looks like the NAMESPACE file already has useDynLib(Rsamtools, .registration=TRUE) as suggested in the post.
https://stackoverflow.com/questions/42313373/r-cmd-check-note-found-no-calls-to-r-registerroutines-r-usedynamicsymbols

One way to get around this is simply skipping the rcmdcheck step, and instead rely only on BiocCheck to catch problems.

BiocCheck

BiocCheck issues several errors and warnings .

> BiocCheck::BiocCheck()

─ BiocCheckVersion: 1.34.2BiocVersion: 3.16Package: RsamtoolsPackageVersion: 2.16.1sourceDir: /Users/schilder/Desktop/forks/RsamtoolsinstallDir: /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmplVBUQa/file1794b3d48fbd8BiocCheckDir: /Users/schilder/Desktop/forks/Rsamtools.BiocCheckplatform: unixisTarBall: FALSE

* Installing package...
* Checking for deprecated package usage...
* Checking for remote package usage...
* Checking for 'LazyData: true' usage...
* Checking version number...
* Checking version number validity...
* Checking R version dependency...
    * NOTE: Update R version dependency from 3.5.0 to 4.2.0.
* Checking package size...
      Skipped... only checked on source tarball
* Checking individual file sizes...
    * WARNING: The following files are over 5MB in size:
      '.git/objects/pack/pack-6e2cbf0bd241b8744672adb07f33834ac8daa212.pack'
* Checking biocViews...
* Checking that biocViews are present...
* Checking package type based on biocViews...
    Software
* Checking for non-trivial biocViews...
* Checking that biocViews come from the same category...
* Checking biocViews validity...
* Checking for recommended biocViews...
    * NOTE: Consider adding these automatically suggested biocViews: ATACSeq,
      SequenceMatching, VariantAnnotation, MultipleSequenceAlignment
Search 'biocViews' at https://contributions.bioconductor.org
* Checking build system compatibility...
* Checking for blank lines in DESCRIPTION...
* Checking if DESCRIPTION is well formatted...
* Checking for proper Description: field...
    * NOTE: The Description field in the DESCRIPTION is made up by less than 3
      sentences. Please consider expanding this field, and structure it as a
      full paragraph
* Checking for whitespace in DESCRIPTION field names...
* Checking that Package field matches directory/tarball name...
* Checking for Version field...
* Checking for valid maintainer...
    * ERROR: Do not use Author/Maintainer fields. Use Authors@R.
    * ERROR: Remove Maintainer field. Use Authors@R [cre] designation.
* Checking License: for restrictive use...
* Checking for pinned package versions...
* Checking DESCRIPTION/NAMESPACE consistency...
* Checking .Rbuildignore...
* Checking for stray BiocCheck output folders...
* Checking vignette directory...
Warning in remind_sweave(if (in.file) input, sweave_lines) :
  It seems you are using the Sweave-specific syntax in line(s) 9, 105, 355, 447, 504; you may need Sweave2knitr("/Users/schilder/Desktop/forks/Rsamtools/vignettes//Rsamtools-Overview.Rnw") to convert it to knitr
Warning in remind_sweave(if (in.file) input, sweave_lines) :
  It seems you are using the Sweave-specific syntax in line(s) 9, 105, 355, 447, 504; you may need Sweave2knitr("/Users/schilder/Desktop/forks/Rsamtools/vignettes//Rsamtools-Overview.Rnw") to convert it to knitr
Warning in remind_sweave(if (in.file) input, sweave_lines) :
  It seems you are using the Sweave-specific syntax in line(s) 9, 105, 355, 447, 504; you may need Sweave2knitr("/Users/schilder/Desktop/forks/Rsamtools/vignettes//Rsamtools-Overview.Rnw") to convert it to knitr
Warning in remind_sweave(if (in.file) input, sweave_lines) :
  It seems you are using the Sweave-specific syntax in line(s) 9, 105, 355, 447, 504; you may need Sweave2knitr("/Users/schilder/Desktop/forks/Rsamtools/vignettes//Rsamtools-Overview.Rnw") to convert it to knitr
Warning in remind_sweave(if (in.file) input, sweave_lines) :
  It seems you are using the Sweave-specific syntax in line(s) 9, 105, 355, 447, 504; you may need Sweave2knitr("/Users/schilder/Desktop/forks/Rsamtools/vignettes//Rsamtools-Overview.Rnw") to convert it to knitr
* Checking package installation calls in R code...
* Checking for library/require of Rsamtools...
* Checking coding practice...
    * NOTE: Avoid sapply(); use vapply()
    * NOTE: Avoid the use of 'paste' in condition signals
* Checking parsed R code in R directory, examples, vignettes...
    * NOTE: Avoid '<<-' if possible (found 2 times)
* Checking function lengths...
    * NOTE: The recommended function length is 50 lines or less. There are 411
      functions greater than 50 lines.
* Checking man page documentation...
    * WARNING: Add non-empty \value sections to the following man pages:
    * ERROR: At least 80% of man pages documenting exported objects must have
      runnable examples.
    * NOTE: Usage of dontrun{} / donttest{} tags found in man page examples. 11%
      of man pages use at least one of these tags.
    * NOTE: Use donttest{} instead of dontrun{}.
* Checking package NEWS...
* Checking unit tests...
* Checking skip_on_bioc() in tests...
* Checking formatting of DESCRIPTION, NAMESPACE, man pages, R source, and
  vignette source...
    * NOTE: Consider shorter lines; 21 lines (0%) are > 80 characters long.
    * NOTE: Consider 4 spaces instead of tabs; 70 lines (1%) contain tabs.
    * NOTE: Consider multiples of 4 spaces for line indents; 1633 lines (18%)
      are not.
    See https://contributions.bioconductor.org/r-code.html
    See styler package: https://cran.r-project.org/package=styler as described
      in the BiocCheck vignette.
* Checking if package already exists in CRAN...
'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for
details

replacement repositories:
    CRAN: https://cran.rstudio.com/

* Checking for bioc-devel mailing list subscription...
    * NOTE: Cannot determine whether maintainer is subscribed to the Bioc-Devel
      mailing list (requires admin credentials). Subscribe here:
      https://stat.ethz.ch/mailman/listinfo/bioc-devel
* Checking for support site registration...
    Maintainer email is ok.BiocCheck results ──
1 ERRORS | 2 WARNINGS | 14 NOTES

See the Rsamtools.BiocCheck folder and run
    browseVignettes(package = 'BiocCheck')
for details.

Author and Maintainer fields

I've gone ahead and reformatted these so that they're compliant with the latest release of BiocCheck, by collapsing them into one "Authors@R" field.

"At least 80% of man pages documenting exported objects must have runnable examples."

I'll leave this one to you @mtmorgan so you can decide on what the examples should be, and for which functions they're included in.

"Add non-empty \value sections to the following man pages:"

man/ApplyPileupsParam-class.Rd, man/BamFile-class.Rd, man/BamViews-class.Rd, man/BcfFile-class.Rd, man/FaFile-class.Rd, man/PileupFiles-class.Rd, man/RsamtoolsFile-class.Rd, man/RsamtoolsFileList-class.Rd, man/ScanBamParam-class.Rd, man/ScanBcfParam-class.Rd, man/TabixFile-class.Rd

I'll also leave this to you as you know these functions best.

Hope this helps!,
Brian

@bschilder bschilder reopened this Sep 1, 2023
@bschilder
Copy link
Author

Hi @mtmorgan, just wanted to check in and see whether you might want to merge this PR.

To recap:

  • I've turned off all CRAN checks (since you disagreed with some of their messages).
  • BiocCheck has indicated there is 1 Error and 2 Warnings. These can be addressed after merge. Or the setting in BiocCheck can be changed if you disagree with the error/warnings.
  • RE: concerns about maintenance. The idea is that you won't have to maintain rworkflows, that's done entirely on the rworkflows developer's side (i.e. me and my lab).
  • Like I said, we're very much open to suggestions. But either way, I do think having some GitHub CI is better than none. Ensures there's another layer of checks in between developers and when the package gets distributed to users on Bioc.

Hope that helps clear things up, happy to answer any questions!

@mtmorgan
Copy link
Collaborator

mtmorgan commented Sep 1, 2023

I'll leave this for @hpages or @vjcitn to resolve.

@vjcitn
Copy link

vjcitn commented Sep 1, 2023

We will look at it.

@hpages
Copy link
Contributor

hpages commented Sep 1, 2023

Hi @bschilder,

A few minor things that caught my attention:

  • It looks like the PR re-introduces the Sweave vignette (Rsamtools-Overview.Rnw) which got removed from the package in April 2023.
  • The new entry in NEWS says CHANGES IN VERSION 2.16.1 but I don't see a version bump, and the version of the package is 2.17.0 in devel.
  • Please don't add https://github.com/Bioconductor/Rsamtools to the URL field in the DESCRIPTION file. The BugReports field already points people to the issue tracker on GitHub. Bioconductor prefers to make the package landing page (https://bioconductor.org/packages/Rsamtools) the primary URL, and not advertize the URL of the Github repo so much because this encourages people to install the package from there (which is bad) or to use the issue tracker to ask questions about how to use the software (which is also bad).

Thanks!

@hpages
Copy link
Contributor

hpages commented Sep 1, 2023

Also how is the addition of a .gitignore file related to the "rworkflows" functionality? This is distracting. If you want to make the case for adding this file please open an issue and/or submit a separate PR. We already discussed this here 😉

In the same spirit, the new .Rbuildignore file should only have the ^.github$ line. AFAIK the other lines are not related to the "rworkflows" functionality either.

Again, PRs should stick to the matter and not try to squeeze random/unrelated changes. Thanks!

@hpages
Copy link
Contributor

hpages commented Sep 1, 2023

Finally I'm not sure I understand the following lines:

run_bioccheck: ${{ true }}
run_rcmdcheck: ${{ false }}

Does it mean that the workflow will run BiocCheck but not R CMD check?

Same with the following lines:

has_testthat: ${{ true }}
has_runit: ${{ false }}

Shouldn't this be the otherway around?

@bschilder
Copy link
Author

  • It looks like the PR re-introduces the Sweave vignette (Rsamtools-Overview.Rnw) which got removed from the package in April 2023.

Likely due to the lag between when I first made the PR (Dec 23, 2022) and now. I merged the latest version of Rsamtools into my fork but I guess some files were kept by accident.
Easy fix, just removed it.

  • The new entry in NEWS says CHANGES IN VERSION 2.16.1 but I don't see a version bump,

and the version of the package is 2.17.0 in devel.

Again due to the lag, and apparently the fact that the NEWS file isn't being regularly updated?
In any case, just updated the version in both files.

  • Please don't add https://github.com/Bioconductor/Rsamtools to the URL field in the DESCRIPTION file. The BugReports field already points people to the issue tracker on GitHub. Bioconductor prefers to make the package landing page (https://bioconductor.org/packages/Rsamtools) the primary URL, and not advertize the URL of the Github repo so much because this encourages people to install the package from there (which is bad) or to use the issue tracker to ask questions about how to use the software (which is also bad).

Removed.

Also how is the addition of a .gitignore file related to the "rworkflows" functionality? This is distracting. If you want to make the case for adding this file please open an issue and/or submit a separate PR. We already discussed this here 😉

Simply wanted to make sure I wasn't pushing any extra files that I didn't intend to (e.g macs are constantly generating .DS_Store files). But happy to remove this and remove any other residual files manually.

In the same spirit, the new .Rbuildignore file should only have the ^.github$ line. AFAIK the other lines are not related to the "rworkflows" functionality either.

Not quite. These are directly related to rworkflows, as the files I listed there can be generated in the course of GHA checks on certain platforms (namely Windows). I figured I'd add them to save you from scratching your head when things fail later.

@bschilder
Copy link
Author

Finally I'm not sure I understand the following lines:

run_bioccheck: ${{ true }}
run_rcmdcheck: ${{ false }}

Does it mean that the workflow will run BiocCheck but not R CMD check?

Yup, but you can change it to whatever configuration your prefer! Just let me know which you would like and I'd be happy to do it for you.

Same with the following lines:

has_testthat: ${{ true }}
has_runit: ${{ false }}

Shouldn't this be the otherway around?

In your case yes, thanks for the catch! These are simply the defaults. Just fixed this.

That said, outside of this PR I'd highly encourage you to consider adding some more targeted unit tests, rather than relying solely on the generic BiocGenerics:::testPackage function which doesn't cover most scenarios from my understanding. But perhaps something to tackle another day.

If you're unsure about any of the parameters, you can always read about each of them in more detail here, or use:
?rworkflows::use_workflow

@hpages
Copy link
Contributor

hpages commented Sep 1, 2023

I'm not convinced that these lines in .Rbuildignore are related to the "rworkflows" functionality:

^.*\.Rproj$
^\.Rproj\.user$

^migration_notes.md$

so maybe at least those can be removed?

Not sure about the other ones:

node_modules$
package-lock\.json$
package\.json$

but I don't see them here so I'm confused!

I would suggest that you set run_rcmdcheck: to ${{ true }} in the PR. Isn't running R CMD check the main purpose of "rworkflows"? Even better, the factory settings for "rworkflows" should focus on trying to reproduce the build/check results produced by the official daily builds, to avoid confusion for developers and users of the package. For this purpose I would also suggest that you set run_bioccheck: to ${{ false }} in the PR.

The potential for confusion is real and should not be underestimated. Just earlier this morning @vjcitn merged your other "rworkflows" PR for VariantAnnotation (Bioconductor/VariantAnnotation#73), but then a couple hours later Vince had to revert the merge because check was failing, probably because the yaml file also had run_bioccheck: set to ${{ true }}.

As mentioned already here, BiocCheck has a tendency to produce some false positives which is why we don't run it as part of our daily builds yet.

About the NEWS file. Some Bioconductor packages adopt the "one big section per Bioconductor release" approach. That means that all the changes that happened in the package between BioC 3.17 and BioC 3.18 are documented in a big section that starts with:

CHANGES IN VERSION X.Y
----------------------

where X.Y is the major version that the package will have at the time of the release (2.18 in the case of Rsamtools). That big section is itself organized in subsections, typically NEW FEATURES, SIGNIFICANT USER-VISIBLE CHANGE, DEPRECATED AND DEFUNCT, and BUG FIXES.

That said, outside of this PR I'd highly encourage you to consider adding some more targeted unit tests, rather than relying solely on the generic BiocGenerics:::testPackage function which doesn't cover most scenarios from my understanding.

Not sure what you mean exactly. Calling BiocGenerics:::testPackage("MyPackage") is how we run the unit tests in the RUnit world. So this would be like saying that test_check("MyPackage") doesn't cover most scenarios in the testthat world. 🤔

Why add an empty line far down in the NEWS file? (just above the CHANGES IN VERSION 2.10 section). Again let's please avoid polluting the PR with random/unrelated changes. It's easy to spot this kind of pollution by checking the diff here: https://github.com/Bioconductor/Rsamtools/pull/48/files

Thanks

@bschilder
Copy link
Author

I'm not convinced that these lines in .Rbuildignore are related to the "rworkflows" functionality:

^.*\.Rproj$
^\.Rproj\.user$

^migration_notes.md$

so maybe at least those can be removed?

Sure, removed.

Not sure about the other ones:

node_modules$
package-lock\.json$
package\.json$

but I don't see them here so I'm confused!

The issue is intermittent, haven't figured out the logic to it yet (partly due to how GHA sets up their VMs, I suspect). Can remove if you like, I'll leave it to you to fix if and and when the errors return.

I would suggest that you set run_rcmdcheck: to ${{ true }} in the PR. Isn't running R CMD check the main purpose of "rworkflows"? Even better, the factory settings for "rworkflows" should focus on trying to reproduce the build/check results produced by the official daily builds, to avoid confusion for developers and users of the package. For this purpose I would also suggest that you set run_bioccheck: to ${{ false }} in the PR.

Done.

The potential for confusion is real and should not be underestimated. Just earlier this morning @vjcitn merged your other "rworkflows" PR for VariantAnnotation (Bioconductor/VariantAnnotation#73), but then a couple hours later Vince had to revert the merge because check was failing, probably because the yaml file also had run_bioccheck: set to ${{ true }}.
As mentioned already here, BiocCheck has a tendency to produce some false positives which is why we don't run it as part of our daily builds yet.

I don't have any issues with BiocCheck atm. But if you do, I recommend take it up with the BiocCheck team:
https://github.com/Bioconductor/BiocCheck

About the NEWS file. Some Bioconductor packages adopt the "one big section per Bioconductor release" approach. That means that all the changes that happened in the package between BioC 3.17 and BioC 3.18 are documented in a big section that starts with:

CHANGES IN VERSION X.Y
----------------------

where X.Y is the major version that the package will have at the time of the release (2.18 in the case of Rsamtools). That big section is itself organized in subsections, typically NEW FEATURES, SIGNIFICANT USER-VISIBLE CHANGE, DEPRECATED AND DEFUNCT, and BUG FIXES.

Up to you how you want to document things, just pointing out the reason why it occurred.

That said, outside of this PR I'd highly encourage you to consider adding some more targeted unit tests, rather than relying solely on the generic BiocGenerics:::testPackage function which doesn't cover most scenarios from my understanding.

Not sure what you mean exactly. Calling BiocGenerics:::testPackage("MyPackage") is how we run the unit tests in the RUnit world. So this would be like saying that test_check("MyPackage") doesn't cover most scenarios in the testthat world. 🤔

We can discuss this in more detail elsewhere.

Why add an empty line far down in the NEWS file? (just above the CHANGES IN VERSION 2.10 section). Again let's please avoid polluting the PR with random/unrelated changes. It's easy to spot this kind of pollution by checking the diff here: https://github.com/Bioconductor/Rsamtools/pull/48/files

Done.

@bschilder
Copy link
Author

bschilder commented Sep 2, 2023

@hpages if you're interested in learning about the purpose of rworkflows, I'd recommend you take a look at the documentation:
https://github.com/neurogenomics/rworkflows/issues

As well as the associated preprint:
https://www.researchsquare.com/article/rs-2399015/v1

@hpages
Copy link
Contributor

hpages commented Sep 3, 2023

Thanks for all the changes and the links you provided above.

Why remove the .Rbuildignore file? Don't we need to prevent the .github folder to end up in the source tarball?

Up to you how you want to document things etc...

I don't maintain Rsamtools and didn't choose the style/format of its NEWS file (even though I adopted that style in the packages I maintain). But it's usually on the authors of a PR to do their best effort to follow the style used in the piece of software they are contributing to. Hence the reason I took the time to give you feedback on how NEWS should be formatted.

@hpages
Copy link
Contributor

hpages commented Sep 3, 2023

BTW Bioconductor has moved to using devel instead of master but I just spotted that the yaml file sill uses the latter. The other PRs you've submitted for other Bioconductor packages seem to have the same problem. Thx

@bschilder
Copy link
Author

Thanks for all the changes and the links you provided above.

Why remove the .Rbuildignore file? Don't we need to prevent the .github folder to end up in the source tarball?

Sorry, I interpreted your previous messages as wanting to remove the .Rbuildignore file. I'll add it back.

BTW Bioconductor has moved to using devel instead of master but I just spotted that the yaml file sill uses the latter. The other PRs you've submitted for other Bioconductor packages seem to have the same problem. Thx

Thanks for noticing this, thought I had added "devel" but i guess not. I've adjusted this one, and will do so for the others.

PS - It looks like when I forked the repo GitHub automatically must have renamed the branch "master". I'll keep an eye out for this in the future:
https://github.com/neurogenomics/Rsamtools/tree/master

@hpages
Copy link
Contributor

hpages commented Sep 5, 2023

It looks like when I forked the repo GitHub automatically must have renamed the branch "master"

Or maybe you forked before we renamed the branch? The master -> devel renaming happened this year (see the various announcements on the bioc-devel mailing lists about that). However it seems that you forked before that.

Any chance you can amend your addition to the NEWS file? All you need to do is replace

CHANGES IN VERSION 2.17.1

with

CHANGES IN VERSION 2.18

Finally can you please explain how the yaml file makes sure that the right version of Bioconductor is used for building and checking the package? I see that the use of the bioconductor/bioconductor_docker:devel container is hardcoded in the yaml file. Note that this would only work for the devel branch of the package, but not for the RELEASE_* branches. What am I missing?

Thanks again,
H.

@LiNk-NY
Copy link

LiNk-NY commented Sep 5, 2023

@hpages
Can you elaborate on what false positives you are seeing in BiocCheck? (ideally with reproducible examples)
Thanks!

@hpages
Copy link
Contributor

hpages commented Sep 5, 2023

@LiNk-NY Not really the place here to discuss this but it's actually a mix of:

  • false positives (e.g. detection of examples, vignettes, or unit tests, that trigger installation of packages),
  • checks that are too stringent (e.g. ERROR instead of WARNING on use of Sys.setenv),
  • and checks that lack consensus (e.g. enforcing use of Authors@).

@LiNk-NY
Copy link

LiNk-NY commented Sep 5, 2023

Thanks, I've moved the discussion to Bioconductor/BiocCheck#197

@vjcitn
Copy link

vjcitn commented Sep 12, 2023

@bschilder I wanted to mention in connection with this PR (and others) that our team has been discussing the proposal to adopt rworkflows for this and other packages. The current thinking is that an intensive multiparty discussion about specific GHA-oriented solutions for Bioconductor packages is needed. We will be in touch as this discussion is scheduled.

@bschilder
Copy link
Author

bschilder commented Sep 12, 2023

@bschilder I wanted to mention in connection with this PR (and others) that our team has been discussing the proposal to adopt rworkflows for this and other packages. The current thinking is that an intensive multiparty discussion about specific GHA-oriented solutions for Bioconductor packages is needed. We will be in touch as this discussion is scheduled.

@vjcitn I've actually been working on adding some new features to rworkflows to make it even more Bioc friendly. will share v soon! and thanks, looking forward to the discussion

@bschilder
Copy link
Author

bschilder commented Sep 12, 2023

It looks like when I forked the repo GitHub automatically must have renamed the branch "master"

Or maybe you forked before we renamed the branch? The master -> devel renaming happened this year (see the various announcements on the bioc-devel mailing lists about that). However it seems that you forked before that.

Ah you're right, that makes more sense actually.

Any chance you can amend your addition to the NEWS file? All you need to do is replace

CHANGES IN VERSION 2.17.1

with

CHANGES IN VERSION 2.18

Sure, I'm assuming you want me to bump the version in DESCRIPTION as well?

Finally can you please explain how the yaml file makes sure that the right version of Bioconductor is used for building and checking the package? I see that the use of the bioconductor/bioconductor_docker:devel container is hardcoded in the yaml file. Note that this would only work for the devel branch of the package, but not for the RELEASE_* branches. What am I missing?

I'm not sure I would call thebioconductor/bioconductor_docker:devel parameter "hard coded", in the same way I wouldn't call any of the other parameters in the workflow file hard coded (e.g. run_biocheck: {{ false }}). These are all parameters which the user can change to fit their needs and preferences. So it's up the user which is the "right" version of bioconductor they wish to run CI on for each VM.

Coordinating within VMs

Let's take a closer look:

matrix:
        config:
        - os: ubuntu-latest
          r: devel
          bioc: devel
          cont: bioconductor/bioconductor_docker:devel
          rspm: https://packagemanager.rstudio.com/cran/__linux__/focal/release
        - os: macOS-latest
          r: latest
          bioc: release
        - os: windows-latest
          r: latest
          bioc: release

You'll notice that on the ubuntu-latest VM the versions of r / bioc / cont are all coordinated (ie. they use devel). Similarly, on the VMs macOS-latest and windows-latest the versions of r / bioc / cont are all coordinated (ie. they use latest/release).

Users can indeed change these parameters to suit their needs, but I provide a bit of extra help within the rworkflows::use_workflow function via the rworkflows::construct_runners helper subfunction. The assists users when selecting the VM runner options that make sense together, which is reflected in the example above.

Here is the relevant documentation:
https://neurogenomics.github.io/rworkflows/reference/use_workflow.html
https://neurogenomics.github.io/rworkflows/reference/construct_runners.html

Specifying runners in the workflows vs. the action

Now, you might ask "Why not just pass a simplified parameter to specify the runners, instead of including the matrix:config explicitly within the workflow?". Well there's a couple reason for this, the main one being that GHA only lets you specify the runner within workflow files, not action scripts. For this reason, I can't simply pass in a parameter named something like: use_runner: ubuntu-latest.

Coordinating across branches

If I understand correctly, there is a second point that you've made regarding the coordination between branches and VM runners.

So for example, if you make pushes to the RELEASE_3.17 branch, you may only want to run VMs using the release version of Bioc. And when making pushes to the 'devel' branch to may only want to run VMs using the devel version of Bioc. In this case, you would simply generate two separate workflow files like so:

(note i'm using the dev version of rworkflows here bc i just edited some of the args to make this even easier.)

remotes::install_github("neurogenomics/rworkflows@dev")

#### Write devel branch workflow ####
v <- "devel"
f1 <- use_workflow(name = paste("rworkflows",v,sep="."), 
                   branches = v, 
                   runners = construct_runners(bioc = v), 
                   preview = TRUE,
                   force_new = TRUE,
                   save_dir = tempdir() # For demo only, use default in practice
                   ) 

#### Write RELEASE_3_17 branch workflow ####
v <- "RELEASE_3_17"
f2 <- use_workflow(name = paste("rworkflows",v,sep="."), 
                   branches = v, 
                   runners = construct_runners(bioc = v), 
                   preview = TRUE,
                   force_new = TRUE,
                   save_dir = tempdir() # For demo only, use default in practice
                   ) 

The above code will generate two workflow files; one for the devel branch and one for the RELEASE_3_17 branch. They will each use the respectively appropriate version of Bioc.

@LiNk-NY
Copy link

LiNk-NY commented Sep 12, 2023

Quick question: How do you learn that the RSPM / PPM now uses jammy instead of focal? Is this done manually?

Currently, the Bioconductor devel container uses codename jammy and the config should look like:

cont: bioconductor/bioconductor_docker:devel
rspm: https://packagemanager.rstudio.com/cran/__linux__/jammy/release

verified with:

docker run -it bioconductor/bioconductor_docker:devel bash
root@:/# lsb_release -a
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.3 LTS
Release:	22.04
Codename:	jammy

@bschilder
Copy link
Author

bschilder commented Sep 12, 2023

Quick question: How do you learn that the RSPM / PPM now uses jammy instead of focal? Is this done manually?

Currently, the Bioconductor devel container uses codename jammy and the config should look like:

cont: bioconductor/bioconductor_docker:devel
rspm: https://packagemanager.rstudio.com/cran/__linux__/jammy/release

verified with:

docker run -it bioconductor/bioconductor_docker:devel bash
root@:/# lsb_release -a
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.3 LTS
Release:	22.04
Codename:	jammy

Oh interesting! I didn't realise this was an element that was liable to change. I can add a function to use_workflows that infers the correct codename.

Made a dedicated Issue for it here:
neurogenomics/rworkflows#73

@hpages
Copy link
Contributor

hpages commented Sep 12, 2023

Sure, I'm assuming you want me to bump the version in DESCRIPTION as well?

Nope. You've already bumped it to 2.17.1, which is how it should be. (Remember that the core team will bump the package version to 2.18 at release time.)

So for example, if you make pushes to the RELEASE_3.17 branch, you may only want to run VMs using the release version of Bioc. And when making pushes to the 'devel' branch to may only want to run VMs using the devel version of Bioc.

Exactly. Using the right VM for the right branch is very important. It's important to realize that trying to build/check a specific branch (e.g. RELEASE_3_17) of a Bioconductor package on the wrong container (e.g. bioconductor/bioconductor_docker:devel) produces meaningless results. I was hoping that this would be somehow achievable via a single static workflow file, but, based on your explanations, it seems that one would need to edit the workflow file in the new branch after each release. This is a concern that I wish could be addressed.

In the meantime, since your PR is against the devel branch, and the workflow file in this PR uses bioconductor/bioconductor_docker:devel, I still don't understand why RELEASE_** is listed under branches:

  push:
    branches:
    - devel
    - RELEASE_**
  pull_request:
    branches:
    - devel
    - RELEASE_**

What am I missing?

Thanks,
H.

@bschilder
Copy link
Author

Sure, I'm assuming you want me to bump the version in DESCRIPTION as well?

Nope. You've already bumped it to 2.17.1, which is how it should be. (Remember that the core team will bump the package version to 2.18 at release time.)

Ok, thanks for clarifying .

So for example, if you make pushes to the RELEASE_3.17 branch, you may only want to run VMs using the release version of Bioc. And when making pushes to the 'devel' branch to may only want to run VMs using the devel version of Bioc.

Exactly. Using the right VM for the right branch is very important. It's important to realize that trying to build/check a specific branch (e.g. RELEASE_3_17) of a Bioconductor package on the wrong container (e.g. bioconductor/bioconductor_docker:devel) produces meaningless results. I was hoping that this would be somehow achievable via a single static workflow file, but, based on your explanations, it seems that one would need to edit the workflow file in the new branch after each release. This is a concern that I wish could be addressed.

In the meantime, since your PR is against the devel branch, and the workflow file in this PR uses bioconductor/bioconductor_docker:devel, I still don't understand why RELEASE_** is listed under branches:

I haven't made the changes yet. Was waiting to hear whether you'd approve of the multi-workflow approach. Will modify now.

@bschilder
Copy link
Author

I haven't made the changes yet. Was waiting to hear whether you'd approve of the multi-workflow approach. Will modify now.

Done!

@bschilder
Copy link
Author

Replying to @hpages comments here as requested:

As @vjcitn mentioned here, adoption of rworkflows for core Bioconductor packages, as well as its official endorsement for Bioconductor packages in general, would still require a multiparty discussion that goes beyond this particular PR.

Sure! Though that was proposed back in September so I think it would be good to figure out a concrete time to have that discussion. Otherwise, I think it's liable to fall off the radar indefinitely (understandably, we're all quite busy).

Perhaps the next TAB meeting would be an appropriate time? @vjcitn I realise there's a lot to discuss in these meetings, so if it's too much to pack into TAB perhaps a separate meeting could be scheduled?

I'd also argue using a tool in a couple of packages doesn't necessarily amount to "official Bioconductor-wide endorsement" of that tool, whether the tool is rworkflows or otherwise. That, to my mind, is a separate issue from this PR.

Personally, I still share the same concern as @mtmorgan about the workflow "doing the right thing", and continuing to do the right thing all year round, with respect to versioning. See Martin's comment here for the details.

Fair enough, but if you'd like some input to the direction of rworkflows, you're always more than welcome to join the monthly Cloud Methods Working Group meetings. Just sent you an invite to the dedicated Slack channel in case you'd like to join the conversation.

You can also always propose any points of improvement as an Issue or a PR on the rworkflows repo.

Conclusions

Regardless, as I've mentioned before I do think there's a strong argument to have some kind of GitHub-based CI vs. none. I think rworkflows via this PR is the easiest avenue to that, and if you change your mind later it's trivial to remove it (just delete the workflow file).

So even if you're not a fan of rworkflows, I highly recommend you implement something sooner than later (e.g. biocthis, custom workflow scripts, etc.). As we've seen here, GH-based CI can identify issues before they're distributed to Bioc.

Best,
Brian

@hpages
Copy link
Contributor

hpages commented Dec 22, 2023

Personally, I still share the same concern as @mtmorgan about the workflow "doing the right thing", and continuing to do the right thing all year round, with respect to versioning. See Martin's comment here for the details.

Fair enough, but if you'd like some input to the direction of rworkflows, you're always more than welcome to join the monthly Cloud Methods Working Group meetings. Just sent you an invite to the dedicated Slack channel in case you'd like to join the conversation.

I see that you added me to this channel. I didn't receive the invite.

You can also always propose any points of improvement as an Issue or a PR on the rworkflows repo.

All I want to know is if this is something you're planning to fix. This is a major blocking issue IMO and it's been mentioned and discussed many times.

@bschilder
Copy link
Author

All I want to know is if this is something you're planning to fix. This is a major blocking issue IMO and it's been mentioned and discussed many times.

@hpages To my knowledge, all changes you requested were completed a while ago. Is there anything else you'd like me to address?

@hpages
Copy link
Contributor

hpages commented Dec 28, 2023

@hpages To my knowledge, all changes you requested were completed a while ago. Is there anything else you'd like me to address?

Now I'm confused. How did you address the versioning concern? You still have:

          r: devel
          bioc: devel
          cont: bioconductor/bioconductor_docker:devel

in the config file, which means that R devel is going to be installed and used on top of the bioconductor_docker:devel container. As explained on several occasions in many different places, this will only be good half of the year, but not all year round. Also why would you do that if the container already has the right version of R?

What's even more confusing is that, one week ago, when I reminded you about this (see above), you seemed to aknowledge that this was still an issue (your answer was "Fair enough etc.."), and you even invited me to discuss it in the monthly Cloud Methods Working Group meetings. But now you say that all the changes I requested were completed... 🤔 ?!!

Maybe we should just close this for now and wait until things have been discussed and sorted out by the Cloud Methods Working Group. It was probably premature for you to submit a PR for something like this in the first place given that there are several alternatives around that should be put on the table by the working group. Once the working group makes a decision, they can always make an announcement on the bioc-devel mailing list and/or the community-bioc Slack.

@hpages hpages closed this Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants