Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cancerprof: API Client for extracting data from State Cancer Profiles #637

Open
13 of 29 tasks
realbp opened this issue Apr 3, 2024 · 42 comments
Open
13 of 29 tasks

Comments

@realbp
Copy link

realbp commented Apr 3, 2024

Submitting Author Name: Brian Park
Submitting Author Github Handle: @realbp
Repository: https://github.com/getwilds/cancerprof
Version submitted: 0.1.0
Submission type: Standard
Editor: @ldecicco-USGS
Reviewers: @jromanowska, @ginberg

Archive: TBD
Version accepted: TBD
Language: en


  • Paste the full DESCRIPTION file inside a code block below:
Package: cancerprof
Title: API Client for State Cancer Profiles
Version: 0.1.0
Authors@R: 
    person("Brian", "Park", , "[email protected]", role = c("aut", "cre"),
           comment = c(ORCID = "0009-0008-8274-3057"))
Description: An interface for retrieving data from the NIH NCI State Cancer Profiles API <https://statecancerprofiles.cancer.gov/>. State Cancer Profiles provides information about data topics including demographics, screening and risk factors, cancer incidence, and mortality for US states, counties, and health service areas.
License: MIT + file LICENSE
URL: https://github.com/getwilds/cancerprof, https://getwilds.org/cancerprof/
BugReports: https://github.com/getwilds/cancerprof/issues
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
Imports: 
    cdlTools,
    cli,
    dplyr,
    httr2,
    magrittr,
    rlang,
    stringr,
    utils
Suggests: 
    knitr,
    rmarkdown,
    testthat
Config/testthat/edition: 3
VignetteBuilder: knitr
Depends: 
    R (>= 2.10)

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • data validation and testing
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

cancerprof allows users to retrieve data from State Cancer Profiles for programmable analysis. cancerprof makes accessing the undocumented API from State Cancer Profiles intuitive and easy.

  • Who is the target audience and what are scientific applications of this package?

The target audience for cancerprof is anyone who wants to access data from state cancer profiles to conduct programmable analysis without having to navigate the complex nature of its GUI. Specifically, cancer researchers could use cancerprof to conduct reproducable analysis of cancer crossed references with a variety of topics found within the data from state cancer profiles.

Currently there are no other softwares or packages that extracts the publicly available data from State Cancer Profiles.

Cancerprof does not breach any data privacy laws and complies with the ethics policies of ropensci.

  • If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

#635

  • Explain reasons for any pkgcheck items which your package is unable to pass.

Cancerprof passes all pkgcheck items

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@ropensci-review-bot
Copy link
Collaborator

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

@ropensci-review-bot
Copy link
Collaborator

🚀

Editor check started

👋

@ropensci-review-bot
Copy link
Collaborator

Checks for cancerprof (v0.1.0)

git hash: 36706151

  • ✔️ Package name is available
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✔️ Package has continuous integration checks.
  • ✔️ Package coverage is 97.9%.
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.

Package License: MIT + file LICENSE


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 232
internal cancerprof 66
internal stats 28
internal graphics 8
imports magrittr 122
imports dplyr 64
imports cli 55
imports rlang 25
imports httr2 8
imports utils 2
imports cdlTools 1
imports stringr 1
suggests knitr NA
suggests rmarkdown NA
suggests testthat NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

list (76), c (56), structure (28), body (20), class (12), url (9), options (8), paste0 (8), new.env (5), as.raw (4), date (4), which (2)

magrittr

%>% (122)

cancerprof

create_request (20), demo_crowding (1), demo_education (1), demo_food (1), demo_income (1), demo_insurance (1), demo_language (1), demo_mobility (1), demo_population (1), demo_poverty (1), demo_svi (1), demo_workforce (1), dput_resp_demo (1), dput_resp_incd (1), dput_resp_mortality (1), dput_resp_risk (1), fips_scp (1), get_area (1), handle_age (1), handle_alcohol (1), handle_cancer (1), handle_crowding (1), handle_datatype (1), handle_diet_exercise (1), handle_education (1), handle_food (1), handle_income (1), handle_insurance (1), handle_mobility (1), handle_non_english (1), handle_population (1), handle_poverty (1), handle_race (1), handle_screening (1), handle_sex (1), handle_smoking (1), handle_stage (1), handle_svi (1), handle_vaccine (1), handle_women_health (1), handle_workforce (1), handle_year (1), incidence_cancer (1), mortality_cancer (1), process_resp (1), risk_alcohol (1), risk_colorectal_screening (1)

dplyr

across (26), mutate (26), mutate_all (4), all_of (3), na_if (3), filter (2)

cli

cli_abort (55)

stats

setNames (28)

rlang

is_na (23), sym (2)

graphics

frame (8)

httr2

request (8)

utils

data (1), read.csv (1)

cdlTools

fips (1)

stringr

str_pad (1)


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 54 files) and
  • 1 authors
  • 4 vignettes
  • no internal data file
  • 8 imported packages
  • 19 exported functions (median 26 lines of code)
  • 83 non-exported functions in R (median 22 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 54 96.4
files_vignettes 4 95.3
files_tests 45 99.0
loc_R 2611 88.6
loc_vignettes 492 77.7
loc_tests 1533 91.7
num_vignettes 4 96.6 TRUE
n_fns_r 102 77.0
n_fns_r_exported 19 65.9
n_fns_r_not_exported 83 80.0
n_fns_per_file_r 1 0.2 TRUE
num_params_per_fn 4 54.6
loc_per_fn_r 22 65.5
loc_per_fn_r_exp 26 57.4
loc_per_fn_r_not_exp 22 66.9
rel_whitespace_R 9 75.4
rel_whitespace_vignettes 39 83.5
rel_whitespace_tests 12 84.2
doclines_per_fn_exp 56 69.3
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 124 82.6

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

R-CMD-check.yaml

GitHub Workflow Results

id name conclusion sha run_number date
8546148980 pages build and deployment success 13c6cf 6 2024-04-03
8546127063 pkgdown success 367061 17 2024-04-03
8546127064 R-CMD-check success 367061 181 2024-04-03
8546127060 test-coverage success 367061 19 2024-04-03

3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following check_fail:

  1. cyclocomp

Test coverage with covr

Package coverage: 97.93

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
risk_smoking 106
demo_population 70
incidence_cancer 34
mortality_cancer 29
demo_insurance 27
demo_poverty 27
risk_colorectal_screening 21
risk_women_health 19
demo_education 18

Static code analyses with lintr

lintr found the following 102 potential issues:

message number of times
Avoid using sapply, consider vapply instead, that's type safe 24
Lines should not be more than 80 characters. 78


Package Versions

package version
pkgstats 0.1.3.11
pkgcheck 0.1.2.21


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

@ldecicco-USGS
Copy link

@ropensci-review-bot assign @ldecicco-USGS as editor

@ropensci-review-bot
Copy link
Collaborator

Assigned! @ldecicco-USGS is now the editor

@ldecicco-USGS
Copy link

Editor checks:

  • Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package. In particular,
    • Is the case for the package well made?
    • Is the reference index page clear (grouped by topic if necessary)?
    • Are vignettes readable, sufficiently detailed and not just perfunctory?
  • Fit: The package meets criteria for fit and overlap.
  • Installation instructions: Are installation instructions clear enough for human users?
  • Tests: If the package has some interactivity / HTTP / plot production etc. are the tests using state-of-the-art tooling?
  • Contributing information: Is the documentation for contribution clear enough e.g. tokens for tests, playgrounds?
  • License: The package has a CRAN or OSI accepted license.
  • Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?

Editor comments

There could be more information added to the README, although the bare minimum to meet our criteria is there.

In the examples I tried, my first thought was it might be nice to convert some of the text. For instance:

x <- demo_income(
  area = "usa",
  areatype = "state",
  income = "median family income",
  race = "all races (includes hispanic)"
)
head(x$Rank)
[1] "52 of 52" "51 of 52" "50 of 52" "49 of 52" "48 of 52" "47 of 52"

Seems like c(52, 51, 50, etc) would be a more useful output to an R user. You'd probably want/need another column or something to give the user the " of 52". Not mandatory, could be handy though (maybe a simple function to offer users outside of the function? or a simple example within the examples for how to extract the rank number).


@ldecicco-USGS
Copy link

@ropensci-review-bot seeking reviewers

@ropensci-review-bot
Copy link
Collaborator

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/637_status.svg)](https://github.com/ropensci/software-review/issues/637)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

@realbp
Copy link
Author

realbp commented Apr 17, 2024

Thank you for the feedback! I will make those changes in the upcoming version of cancerprof. I have added the ropensci badge and created a NEWS.md file.

What are the next steps in the review process?

@ldecicco-USGS
Copy link

I'm asking around to find 2 reviewers. Hopefully that shouldn't take too long!

@realbp
Copy link
Author

realbp commented Apr 17, 2024

Great, thank you for a speedy response!

@ldecicco-USGS
Copy link

@ropensci-review-bot assign @jromanowska as reviewer

@ropensci-review-bot
Copy link
Collaborator

@jromanowska added to the reviewers list. Review due date is 2024-05-20. Thanks @jromanowska for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

@ropensci-review-bot
Copy link
Collaborator

@jromanowska: If you haven't done so, please fill this form for us to update our reviewers records.

@jromanowska
Copy link

Hi! Just for your information: I'll start with the review soon. There are many free days in May here, in Norway, but I hope I will not need any extension of the review deadline. 🤞

@jromanowska
Copy link

@ldecicco-USGS , I just wanted to notify that {goodpractice} package that is dependency for {pkgcheck} has been archived by CRAN (https://cran.r-project.org/web//packages/goodpractice/index.html) so I couldn't install {pkgcheck} and had to install the GitHub version of {goodpractice} by hand.

@jromanowska
Copy link

Hi, I'm having problems installing the package:

pak::pak("getwilds/cancerprof")
#> Error: ! error in pak subprocess
#> Caused by error: 
#> ! Could not solve package dependencies:
#> * getwilds/cancerprof: ! pkgdepends resolution error for getwilds/cancerprof.
#> Caused by error: 
#> ! Bad GitHub credentials, make sure that your GitHub token is valid.
#> Caused by error in `stop(http_error(resp))`:
#> ! Unauthorized (HTTP 401).
devtools::install_github("getwilds/cancerprof")
#> Downloading GitHub repo getwilds/cancerprof@HEAD
#> Installing 3 packages: terra, raster, cdlTools
#> Installing packages into ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3’
#> (as ‘lib’ is unspecified)
#> trying URL 'https://cloud.r-project.org/src/contrib/terra_1.7-71.tar.gz'
#> Content type 'application/x-gzip' length 836573 bytes (816 KB)
#> ==================================================
#> downloaded 816 KB
#> 
#> trying URL 'https://cloud.r-project.org/src/contrib/raster_3.6-26.tar.gz'
#> Content type 'application/x-gzip' length 576421 bytes (562 KB)
#> ==================================================
#> downloaded 562 KB
#> 
#> trying URL 'https://cloud.r-project.org/src/contrib/cdlTools_1.13.tar.gz'
#> Content type 'application/x-gzip' length 43089 bytes (42 KB)
#> ==================================================
#> downloaded 42 KB
#> 
#> * installing *source* package ‘terra’ ...
#> ** package ‘terra’ successfully unpacked and MD5 sums checked
#> ** using staged installation
#> configure: CC: gcc
#> configure: CXX: g++ -std=gnu++17
#> checking for gdal-config... no
#> no
#> configure: error: gdal-config not found or not executable.
#> ERROR: configuration failed for package ‘terra’
#> * removing ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3/terra’
#> ERROR: dependency ‘terra’ is not available for package ‘raster’
#> * removing ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3/raster’
#> ERROR: dependencies ‘raster’, ‘terra’ are not available for package ‘cdlTools’
#> * removing ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3/cdlTools’
#> 
#> The downloaded source packages are in
#> 	‘/tmp/RtmpyrNDzF/downloaded_packages’
#> ── R CMD build ─────────────────────────────────────────────────────────#> ──────────────────────────────────────────────────
#> ✔  checking for file ‘/tmp/RtmpyrNDzF/remotes24d8468ec6bf/getwilds-cancerprof-23dbd98/DESCRIPTION’ ...
#> ─  preparing ‘cancerprof’:
#> ✔  checking DESCRIPTION meta-information
#> ─  checking for LF line-endings in source and make files and shell scripts
#> ─  checking for empty or unneeded directories
#> ─  looking to see if a ‘data/datalist’ file should be added
#> ─  building ‘cancerprof_0.1.0.tar.gz’
#>    Warning: invalid uid value replaced by that for user 'nobody'
#>    
#> Installing package into ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3’
#> (as ‘lib’ is unspecified)
#> ERROR: dependency ‘cdlTools’ is not available for package ‘cancerprof’
#> * removing ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3/cancerprof’
#> Warning messages:
#> 1: In i.p(...) : installation of package ‘terra’ had non-zero exit status
#> 2: In i.p(...) : installation of package ‘raster’ had non-zero exit status
#> 3: In i.p(...) :
#>   installation of package ‘cdlTools’ had non-zero exit status
#> 4: In i.p(...) :
#>   installation of package ‘/tmp/RtmpyrNDzF/file24d821d2b458/cancerprof_0.1.0.tar.gz’ had non-zero exit status
#> 
#> 

Created on 2024-04-30 with reprex v2.1.0

@ldecicco-USGS
Copy link

Do you get the same error if you install terra and raster independently?

install.packages(c("terra", "raster"))

@jromanowska
Copy link

Today I've tried on another computer (also Linux) and the pak command worked - it actually showed me which system libraries were missing 🤔
After installing those, I could re-run the pak without problems. I will write some comments about this installation process and issues in my review, so that other users may be aware.

@ropensci-review-bot
Copy link
Collaborator

📆 @jromanowska you have 2 days left before the due date for your review (2024-05-20).

@ldecicco-USGS
Copy link

Hi @realbp! I'm sorry I haven't found a 2nd reviewer yet. I sent a few inquires out and didn't get responses back. I should have sent a 2nd batch out but that task got buried in my to-do list. Apologies for the delay! I've sent a few more requests out and hopefully we can get this review process (re)kicked off!

@jromanowska
Copy link

Hi, I'm sorry - I need another week to complete the review. May is horrible with lots of free days here in Norway, and a fantastic summer weather this year, leaving too little time for work! 😅

@ldecicco-USGS
Copy link

@ropensci-review-bot assign @ginberg as reviewer

@ropensci-review-bot
Copy link
Collaborator

@ginberg added to the reviewers list. Review due date is 2024-06-12. Thanks @ginberg for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

@ropensci-review-bot
Copy link
Collaborator

@ginberg: If you haven't done so, please fill this form for us to update our reviewers records.

@jromanowska
Copy link

Thanks for the patience! Great work with {cancerprof}! 🙌

I'm done with reviewing, please check my comments below.

Package Review

  • Briefly describe any working relationship you have (had) with the package authors.
    No working relationship.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README

    • not really - the authors could expand the description by adding 1-2 sentences;
    • there is no order in vignettes/articles, so it's difficult to navigate the
      help page
  • Installation instructions: for the development version of package and any non-standard dependencies in README

    • the required system libraries could be written down (e.g., on Linux, I had
      to install some separately before installing the package);
    • while pak is a great package, some people might not use it and try installing
      cancerprof via devtools which might fail not giving useful information about
      source of error - the authors could add why they recommend installing via pak
  • Vignette(s): demonstrating major functionality that runs successfully locally

    • I could not launch vignettes locally - it says no vignettes found when
      installed
    • I tried reading the vignettes on the webpage - I have some comments to those,
      please check below
  • Function Documentation: for all exported functions

  • Examples: (that run successfully locally) for all exported functions

    • almost all examples are not run - it would be more useful if each example
      showed actually that the function works; this would also make the documentation
      better
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software have been confirmed.
  • Performance: Any performance claims of the software have been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
    • README could be more detailed, as mentioned above
    • webpage is nice (although the vignettes should be changed, see the comments below)
    • when checking with devtools::check() I get a warning:
      Warning: program compiled against libxml 210 using older 209 - I'm not sure
      where it comes from
    • tests were nicely written, using testthat, but maybe the test-dput*R
      files that are in the R directory should be in the tests?

Estimated hours spent reviewing: 16

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Overall, the package is really useful for batch retrieval of data.
The functions are well documented and designed, tests are well written.
I have some comments that I hope will improve the usability of both, the
package and documentation.

Are there improvements that could be made to the code style?

  1. Avoid long lines, e.g., lines 59-63 in demo-education.R could be split
    after each logical operator. This will also improve readability of the if
    clauses.

    • run goodpractice::gp() to get more tips
  2. Running the styler package showed that only 4 files could be improved -
    great work! You don't need to, but you can try that on these files:

    • R/risk-womens-health.R
    • R/risk-smoking.R
    • R/demo-poverty.R
    • R/demo-population.R

Is there code duplication in the package that should be reduced?

  1. When dealing with specific text, like the various levels of smoking, try to
    gather these in one place.
    Currently, these various levels of smoking are
    defined in both risk-smoking.R and handle-smoking.R. This is difficult to
    maintain. One way to deal with that is to create an environment that is loaded
    when the package is loaded. Then, you can just grab the entire objects from
    this environment, without the need for the user to be bothered by these objects.
    Check here for some
    more explanation.

Are there user interface improvements that could be made?

  1. Use the default values of a function to ease function calling

    • if there is only one possible value for a certain argument, implement this
      as a default value (I understand that it might change in the future but
      it could be default without any problem), e.g., demo_language requires
      the user to write language = "language isolation" even if it's the only
      possible value
    • the functions that change their required arguments when one of the
      arguments is chosen are both, difficult to use and difficult to maintain -
      perhaps you can just give a warning that some arguments are omitted?
    • e.g., in demo_education, instead of aborting when the user does not provide some
      arguments, one can give a warning and assume default values
    • especially in Risk smoking
      there is so many various combinations that it might be difficult to use that
      function when programmatically fetching data - perhaps you could split this
      into smaller, more focused functions?
    • similarly, for some arguments combinations, another argument is required
      to be only of a certain value (e.g., incidence_cancer with cancer = ovary
      is possible only with sex = "females"), thus in such cases, if the user
      provides a wrong argument the function might just return the correct table and
      issue a warning
  2. The area and areatype arguments are not checked anywhere, so when user
    puts a wrong value, they get an uninformative error:

Error in `process_resp()`:
! Invalid input, please check documentation for valid arguments.
Run `rlang::last_trace()` to see where the error occurred.

Are there performance improvements that could be made?

No issues here.
The performace depends here on the speed of internet connection of the user.

Is the documentation clear and sufficient?

Installation

  1. I got error when trying to install the package locally, from the cloned GitHub repo:
    gdal-config not found - I can't find any mention that any software like
    gdal is reqiured by cancerprof
  2. trying to install from the net, using pak, as the instructions say, I get the error:
pak::pak("getwilds/cancerprof")
Error:                                                           
! error in pak subprocess
Caused by error: 
! Could not solve package dependencies:
* getwilds/cancerprof: ! pkgdepends resolution error for getwilds/cancerprof.
Caused by error: 
! Bad GitHub credentials, make sure that your GitHub token is valid.
Caused by error in `stop(http_error(resp))`:
! Unauthorized (HTTP 401).
Type .Last.error to see the more details.

(I got this resolved afterwards)

Vignettes and function documentation

  1. The vignettes are basically a list of examples for each of the functions.
    These examples should be (and in most cases are) in the function documentation. Instead,
    a vignette should guide a user through a specific use case scenario.
    Could the authors create a set of scientific questions that the package helps to
    answer? Especially batch retrieval of data is important.

  2. I could not find explanation of "FIPS"

  3. In "Demo mobility" it says: "The function defaults to "all races",
    "both sexes", "ages 1+"" - what does it mean? That the data from the source
    is available only for these categories?
    Usually, when I see "defaults", I would assume that these are the values
    assumed if there is no specific value for an argument. Here, it seems like
    there can't be any other value.

  4. There is a mistake in risk_women_health documentation - wrong names of the
    columns of returned data frame.

  5. The authors could explain the different categories of arguments more, e.g.,
    in risk_women_health, the women_health argument has three possible values
    ("pap smear in past 3 years, no hysterectomy, ages 21-65", "mammogram in past 2 years, ages 50-74", "mammogram in past 2 years, ages 40+") and the columns of the returned data frame are not informative
    enough to understand what type of data one actually gets

  6. Is there a typo in risk_alcohol documentation, under alcohol argument?
    (and does this argument value need to be so long?)

Does the documentation use the principle of multiple points of entry i.e. takes into account the fact that any piece of documentation may be the first encounter the user has with the package and/or the tool/data it wraps?

  1. The webpage is well built, so that one can get to the home page easily
  2. However, there is no common help-page for the package (i.e.,
    help(package = "cancerprof") gives
    URL '/help/library/cancerprof/html/00Index.html' not found)

Were functions and arguments named to work together to form a common, logical programming API that is easy to read, and autocomplete?

Yes - good choice of naming. Although the demo_ functions sounded at first
like "demonstration" to me, but it's logical to abbreviate "demography" as "demo". :)

@realbp
Copy link
Author

realbp commented May 28, 2024

@jromanowska Thank you so much for your hard work at reviewing cancerprof! I will get to work on implementing improvements based on your feedback.

@ropensci-review-bot
Copy link
Collaborator

📆 @ginberg you have 2 days left before the due date for your review (2024-06-12).

@ldecicco-USGS
Copy link

@ropensci-review-bot submit review #637 (comment) time 16

@ropensci-review-bot
Copy link
Collaborator

Logged review for jromanowska (hours: 16)

@ldecicco-USGS
Copy link

Thank you @jromanowska for the thorough review!

@ldecicco-USGS
Copy link

Just a heads up that I'll be on leave from June 17-July 8 and may or may not be able to get to this issue during that time.

@jromanowska
Copy link

I will be on holidays entire July. I can probably read and comment here, but I won't be able to test anything.

@ginberg
Copy link

ginberg commented Jun 12, 2024

Well done with creating this package!
See below for my comments.

Package Review

  • Briefly describe any working relationship you have (had) with the package authors.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README
    The README is very brief. Please add some explanation about what type of data can be obtained. Also can you mention the target audience?
  • Installation instructions: for the development version of package and any non-standard dependencies in README
    I had no problems installing the package
  • Vignette(s): demonstrating major functionality that runs successfully locally
    It's great that there is a vignette about each data topic. What I miss is an overview of State Cancer Profiles and how these vignettes are connected. I think an overview vignette that contains a description about State Cancer Profiles with links to the other vignettes is useful. In this vignette you could also mention which type of user could use a certain vignette.
  • Function Documentation: for all exported functions
  • Examples: (that run successfully locally) for all exported functions
    Each exported function has examples but they are mostly inside a dontrun, so they don't actually run. It would be useful if at least some of them can be run locally.
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software have been confirmed.
  • Performance: Any performance claims of the software have been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 6

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

  • Does the code comply with general principles in the Mozilla reviewing guide?
    If there is a parameter in a function that can only have 1 value (e.g. 'crowding' in demo_crowding), you don't need to make it a parameter at all. Or set the value as default. Try to make it as simple as possible for the user to use the function.
  • Is there code duplication in the package that should be reduced?
    There is some code duplication, especially related to texts. For example the text in handle_alcohol. Better make it a variable so it will be easier to update.
  • Is the documentation (installation instructions/vignettes/examples/demos) clear and sufficient?
    the README and vignettes could be improved (see above)
  • Other
    The package relies heavily on a working internet connection, if it's not available the tests will fail.
ℹ Testing cancerprof
✔ | F W  S  OK | Context
✖ | 3        1 | demo-crowding                                      
────────────────────────────────────────────────────────────────────
Error (test-demo-crowding.R:8:3): Output data type is correct
<httr2_failed/rlang_error/error/condition>
Error: Could not resolve host: statecancerprofiles.cancer.gov
Backtrace:
    ▆
 1. ├─cancerprof::demo_crowding(...) at test-demo-crowding.R:8:2
 2. │ └─... %>% req_perform() at cancerprof/R/demo-crowding.R:57:2
 3. └─httr2::req_perform(.)
 4.   └─base::tryCatch(...)
 5.     └─base (local) tryCatchList(expr, classes, parentenv, handlers)
 6.       └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
 7.         └─value[[3L]](cond)

It would be good to fail gracefully with an informative message if a resource is not available (and not give a error). See here for some ideas. You probably need to fix this anyway when you want the package to be on CRAN.

@ldecicco-USGS
Copy link

@ropensci-review-bot submit review #637 (comment) time 6

@ropensci-review-bot
Copy link
Collaborator

Logged review for ginberg (hours: 6)

@ldecicco-USGS
Copy link

Awesome, thanks @ginberg !

@realbp go ahead and start responding to the reviews. I'll try to jump in when I can, but will be a little slow for the next 3 weeks.

@ropensci-review-bot
Copy link
Collaborator

@realbp: please post your response with @ropensci-review-bot submit response <url to issue comment> if you haven't done so already (this is an automatic reminder).

Here's the author guide for response. https://devguide.ropensci.org/authors-guide.html

@ldecicco-USGS
Copy link

Hi @realbp , checking in to see if you've got any updates to respond to the reviewers.

@seankross
Copy link

Hi @ldecicco-USGS, I supervised @realbp's work on this package as part of his internship at the Fred Hutch Data Science Lab, which has sadly ended. We loved working with Brian! Would it be possible for me to take reigns of this review? I plan to be the maintainer of this package once we eventually submit it to CRAN.

@ldecicco-USGS
Copy link

No problem!

@ldecicco-USGS
Copy link

@seankross checking in to see how things are going. From our end, we're waiting on a response to the reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants