-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rnaturalearth #22
Comments
@andysouth Thanks for this! Seeking reviewer(s) |
I could take a look |
Thanks both! On 26 November 2015 at 23:44, Robin [email protected] wrote:
|
Reviewers: @Robinlovelace, @lmullen |
Thanks all. I appreciate your efforts. On 30 November 2015 at 16:17, Scott Chamberlain [email protected]
|
First, let me say that this package is very useful. It will be a very helpful addition to the mapping packages on CRAN and in rOpenSci. I already have plans to use it for my own work and in assignments for students. Well done, @andysouth. It takes about 2 minutes to install the dev version from GitHub and the installed version is 26 MB. In addition to the fact that CRAN won't accept the package until the data size is smaller, the weight of this package also hinders you as the developer from doing updates as necessary, and it makes it difficult for users to install the package. To solve this it will be necessary to break this package into two or more other packages as detailed below. For the purposes of this review I've focused mainly on package-level issues. Once you've solved the size problem and the package design has stabilized, I'll review the code more granularly. Structural changesThis package should be at least two packages, and probably three. Here is a suggestion for how you should break this up:
A compromise solution would be to put minimal data into Now that is a lot of breaking up into multiple packages what will remain, as far as users are concerned, a single user-facing package. But this is the accepted practice for packages which depend on a lot of data. For instance, the When you do this, you should feel free to give the data packages the CC0 license, but to give the @sckott, @karthik, @cboettig: To be able to test these changes effectively, @andysouth will need to use the rOpenSci drat repository. I'd suggest that the package be provisionally accepted and moved into rOpenSci GitHub and drat repositories. Tooling
DESCRIPTION
TestsAll the tests pass. I'll check coverage more carefully once the package structure has stabilized. In the meantime, all of the tests which download files should have VignetteThe vignette should not need download anything from Natural Earth. The code block at the end with two downloads should be set to Code styleThere are places where the code style can be improved. See the suggestions in this chapter of Hadley Wickham's R Packages book or the style chapter in his Advanced R book.
myfunc <- function(type = "a") {
if(type == "a") {
# ...
} else if (type =="b") {
} else {
stop("Error.")
}
} Should be changed to this: myfunc <- function(type = c("a", "b")) {
type <- match.arg(type)
# Logic checking type here.
}
Human language style
|
Thanks @lmullen for a very helpful review and for giving me clear On 14 December 2015 at 22:31, Lincoln Mullen [email protected]
|
ReviewA number of packages provide geographic data, but none has yet made the high quality datasets within the naturalearth project easily available to R users. rnaturalearth does this with a user-friendly command-line interface, which encourages exploration of what the collection has to offer, in addition to simply providing country and 'state' outlines at various level of geographical resolution. The package will very useful for anyone wanting to make simple yet accurate maps quickly. ROpenSci, with its emphasis on data access, is an ideal community space to host this package. Useability of the functionsOverall I found the package intuitive and easy to use. The preface of The main vignette succintly describes the core purpose and functions of the package, although I think more on what extra can be downloaded via Minor comments/suggestions:
roads <- ne_download(scale = 10, type = "roads", category = "cultural")
railroads <- ne_download(scale = 10, type = "railroads", category = "cultural")
Thoughts on reducing the package sizeIn-line with the previous reviewer I think the size of the package could be problematic. I think the suggestion of splitting it into other packages is a good one. An alternative would be to provide a default destination within the package for data download, not host any data on the package itself directly, and simply use the package to access the data hosted on rnaturalearth. Other than package install requirements, the clear advantage of that is that the data won't have to be updated. The disadvantage is that the user would need internet access the first time they accessed many datasets and possible complexities surrounding fresh installs wiping previously downloaded data.
ne_countries_10 <- function(){
file_path <- file.path(system.file("data", package = "rnaturalearth"), "countries10.rda")
if(file.exists(file_path)){
x <- readRDS(file_path)
}else{
x <- ne_download()
saveRDS(x, file_path)
}
x
} Note that I'm not sure about specifying providing the same data with I'd like to point out that I'm not an expert in this area, think the previous reviewers' suggestions in relation to package size is also another viable option and would be interested in hearing views from others, especially in relation to the DRY principle as it applies to data. |
Thanks @Robinlovelace https://github.com/Robinlovelace for a considered On 17 December 2015 at 00:15, Robin [email protected] wrote:
|
Another option for reducing package size (though experimental and could delay CRAN submission). In another context, I'm developing a framework for using GitHub to distribute evolving data sources without putting them in a package; see early vignette, and beginnings of manuscript which include a little about where we're thinking of taking this. In short, the data don't go in the package, or even in the repository but live in "github releases" and are fetched as they're used. This decouples the development of the code of the package from the data source which avoids the issues of duplicating the data over and over on CRAN and makes for fast package downloads. |
@richfitz: datastorr looks really great. I'm looking forward to the release. |
hey @andysouth let us know if there's anything we can do to help... |
Thanks @sckott https://github.com/sckott. I've been meaning to get back On 11 February 2016 at 19:34, Scott Chamberlain [email protected]
|
thanks! sounds good |
Thanks for your patience, finally found spare time to finish addressing these useful recomendations from the reviewers. It fails travis checks currently due to the relationships between the 3 packages, but passes checks locally. responses to @lmullenThis package should be at least two packages, and probably three.done. converted to 3 packages rnaturalearth, rnaturalearthdata & rnaturalearthhires. The first 2 are aimed at CRAN and the 3rd the rOpenSci drat repo) @sckott, @karthik, @cboettig: To be able to test these changes effectively, @andysouth will need to use the rOpenSci drat repository. I'd suggest that the package be provisionally accepted and moved into rOpenSci GitHub and drat repositories.I'm ready for that stage now if you are. The README.md should be converted to README.Rmd via devtools::use_readme_rmd().done The package should use Appveyor's CI for Windows via devtools::use_appveyor().done Specify a version number for rgdal, httr, and knitr.I don't know what version number is needed for these packages. Feel like I would just be making one up. Where might I look ? All the tests pass. I'll check coverage more carefully once the package structure has stabilized. In the meantime, all of the tests which download files should have skip_on_cran().done The vignette should not need download anything from Natural Earth. The code block at the end with two downloads should be set to eval = FALSE.done Code styleIn the README and vignettes, in place of require(rnaturalearth) use library(rnaturalearth).done Spacing is often inconsistent around = using in function definitions and calls. There should always be a single space on both sides of =. For example, in the README vignette("rnaturalearth", package="rnaturalearth") should be vignette("rnaturalearth", package = "rnaturalearth").I recognise this, but unless it's essential would prefer to keep my sometimes idiosyncratic use of spaces which I use to enhance my readability (and out of habit too). Comments at the start of the line should be followed by a space. (I prefer to user sentence capitalization; but that's just a preference.)done Most important, all code which has been commented out needs to be deleted.done (mostly) The same goes for the data download scripts in data-raw. At the moment, they have a lot investigating the data. Now that you've settled on what to do, you can retain that information in comments but trim unnecessary code.I have kept some commented code as these are the areas where it is most likely I will need to adapt old code in repsonse to potential future data changes, and least likely that users need to access. Functions which have limited categorical options should specify all the options in the function signature and then use match.args(). An example of this is get_data() and the type parameter. For example, a function that looks like thismyfunc <- function(type = "a") { } else { myfunc <- function(type = c("a", "b")) { Logic checking type here.} Done, except where I wanted to give the user more guidance if they submitted an incorrect option. get_data() should be renamed to something less generic, like ne_get_data(). Or, can you avoid exporting it?no longer exported In the headings of the vignettes, prefer sentence-style capitalization. E.g. 1. Maps in the package. (I'd suggest the same for comments as well, but that's entirely up to you.)done in vignettes This line in the vignette should be revised, perhaps into a bulleted list, since a sentence can only contain one colon. contains pre-downloaded vector maps for countries: ne_countries(), states: ne_states() and coastline: ne_coastline()done responses to @RobinlovelaceThe main vignette succintly describes the core purpose and functions of the package, although I think more on what extra can be downloaded via ne_download() would be interesting here.started. More could definitely be added, and I will but would be good to get out first. I think the description of the @param type in ne_download could be better at conveying to the user what datasets are available. It is not clear from the function documentation, for example, that lakes can be downloaded with ne_download(type = 'lakes', category = 'physical'). To resolve this issue I suggest that a link to the dataset types is make more explicit (e.g. "see naturalearthdata.com/downloads/110m-cultural-vectors/ for the types of data that can be downloaded for cultural 110 m resolution data.")Thanks for pointing this out. I've added a table to the start of the man page for ne_download() which outlines the main maps available at different scales To assist with the 'making it clear what data is available' issue, I think that ne_download could usefully be split into ne_download10, ne_download50 and ne_download110, while keeping the generic ne_download function intact. The data available varies depending on scale, so ne_download10, for example, could provide descriptions of the very useful roads and railways datasets.I think that the improved documentation in ne_download() makes splitting it into the 3 functions unnecessary. sp is an import but I think the package would benefit from making it a dependency (as it is with stplanr after some debate): all the data loaded by the package that I've seen is in an sp S4 object. Yet these will not work (e.g. plot) properly unless sp is loaded, as illustrated in the vignette. This would also make the plot commands in the vignette simpler.I'm open to considering making sp a dependency, how do others feel ? I wanted to keep this as light on dependencies as possible. If a user was using fortify with ggplot2 they wouldn't need sp, right ? scale = 10 isn't listed as an argument of ne_countries("scale of map to return, one of 110, 50, 'small', 'medium'") but it works so should be added.done It is not clear that category is a needed argument. airports <- ne_download(scale = 10, type = "airports"), for example, works the same as airports <- ne_download(scale = 10, type = "airports") - there are no types I can see that duplicate names between categories.Whilst you are correct, category is needed to construct the download url. If I were to dispense with it I would need to create a list of all the maps and which category they are in. I wanted to keep the link to the Natural Earth data as simple as possible so that if it changes I don't have too much work to do. I couldn't see how to download ice shelf data succesfully. I was expecting ne_download(scale = 10, type = "antarctic_ice_shelves", category = "physical") or ne_download(scale = 10, type = "antarctic ice shelves", category = "physical") to work but neither seemed to...ne_download(scale=50, type="antarctic_ice_shelves_polys", category="physical") Links to other data sources such as the SRTM (some of which can be accessed through the raster package) would be useful, perhaps in the vignette.I added a link to seealso for ne_download() additionsI've added support for raster downloads and tiny_countries points. Thanks @richfitz for the data storage suggestion. I chose the path more well travelled for now, but am interested in that approach for the future. |
@andysouth Glad to see the next version of this package. @karthik and @sckott: I can review this package in more detail now (not anticipating any problems) but perhaps it would be best to have the high resolution data in drat so I can check that too. @andysouth: The version numbers for rgdal etc should be whatever the current version is, since that is what you are developing against. You can get the current versions from |
hmm, I don't know the process for adding repos to our drat, asked over there ropensci/drat#10 |
@lmullen so we just need |
Yes, I think that should work, right @andysouth? The other two will be on On Tue, Mar 22, 2016 at 2:38 PM, Scott Chamberlain <[email protected]
Lincoln Mullen, http://lincolnmullen.com |
Working on it now |
@andysouth looking back through comments, was I supposed to do something? this does work now |
@sckott @andysouth: Sorry I've been slow. Got hung up with the end of the semester. I'll review the revised package soon. |
@andysouth: Nice work on this package. I think it is very close to being done and accepted into rOpenSci, then on its way to CRAN. Here are a few minor fixes or questions I found. Separation of packagesThe separation of packages seems to work fine. I was a little surprised that the package installs the data packages using devtools instead of using the rOpenSci repository. What is the reason for that choice? I'd prefer that the rOpenSci repository was used if possible. The DESCRIPTION will need this line added:
I'd still like to see package version numbers for the dependencies in the DESCRIPTION. R CMD checkI get the following note, which should be easily corrected:
In addition, R CMD check fails for me when the rnaturalearthhires package is not available. This is because it can't build the vignette. The vignette fails to build if rnaturalearthhires is not available. Since that package won't be CRAN, I don't think this is going to pass CRAN checks. You should probably add I get a warning message when running the code in the examples in
I get this warning message which does not seem relevant to the command run.
The reason is that the default argument for
Testing / continuous integrationI noticed that the Travis and Appveyor tests are failing. At least the Travis tests should be passing. It looks like the problem is that the installation of rgdal fails because you don't have the rgdal dev package installed on Ubuntu. I'm not sure why the binary package as specified in the Travis file is not working. But you should be able to install rgdal from source if you install the apt package Not a requirement, but you might consider adding code coverage with covr. Stylistic considerationsI'd suggest changing the title of the introductory vignette to "Introduction to rnaturalearth." The package is inconsistent in how it handles spaces around That's about it. Nice work! |
Many thanks @lmullen for the thorough 2nd review. I'll sort these issues over the next couple of weeks. |
@sckott I've addressed all issues except one. There's a remaining problem with Travis rgdal install that I've fought with and failed, be good to get some input as it seems you may have had similar with rnoaa. Appveyor is now building successfully. responses to 2nd rOpenSci review from @lmullenSeparation of packagesUsing rOpenSci repository for loading data packages instead of devtools.Done for rnaturalearthhires, but rnaturalearthdata is not yet on rOpenSCi or CRAN, I have put in the code ready to change once rnaturalearthdata is on rOpenSci.
I'd still like to see package version numbers for the dependencies in the DESCRIPTION.Done R CMD check* checking R code for possible problems ... NOTEne_download: no visible global function definition for ‘download.file’ne_download: no visible global function definition for ‘unzip’done The vignette fails to build if rnaturalearthhires is not available. Since that package won't be CRAN, I don't think this is going to pass CRAN checks. You should probably add eval = FALSE to the chunks in the vignette that use the high resolution data.done I get a warning message when running the code in the examples in ne_download()In if (category == "raster") { :the condition has length > 1 and only the first element will be useddone Testing / continuous integrationTravis and Appveyor tests are failing. At least the Travis tests should be passing. It looks like the problem is that the installation of rgdal fails because you don't have the rgdal dev package installed on Ubuntu. I'm not sure why the binary package as specified in the Travis file is not working. But you should be able to install rgdal from source if you install the apt package libgdal-dev. As for Appveyor, it looks like the problem has to do with package installation as well.Now build success on appveyor For Travis I've tried various fixes but still can't get rgdal to install despite the binary installation seemingly working for travis on similar packages e.g. https://github.com/Hackout2/repijson/blob/master/.travis.yml I'd appreciate any input on this. Not a requirement, but you might consider adding code coverage with covr.added a github issue to look at covr for future release Stylistic considerationsI'd suggest changing the title of the introductory vignette to "Introduction to rnaturalearth."done All user-facing documentation (i.e., examples and vignettes) should be made consistent in having a single space on both sides of the =. There are a few places with inconsistent spacing around parentheses as well.done |
asking about rgdal, i've been using old |
jim pointed me to this issue on travis-ci repo travis-ci/travis-ci#5852 So Edzer uses this setup https://github.com/edzer/gstat/blob/master/.travis.yml to install gdal/geos from source worth trying |
@andysouth I looked at the Travis logs and your comments in your Travis config file. Two ideas: First, Travis does install the Ubuntu Second, I couldn't exactly tell which combination of options you used at any given time. I just tried it on my local version of Ubuntu (16.04) and all that I needed to install was |
@andysouth I think you might also want to use |
Thanks @lmullen @sckott I've tried both of your suggestions and various other permutations except for the c setup . I don't fully understand how travis works. There are 2 setups (1-using binary and 2-installing from source ) that apparently should work and seem to work for other packages (1-https://github.com/Hackout2/repijson/blob/master/.travis.yml, 2-https://github.com/jhollist/quickmapr/blob/master/.travis.yml#L3) but don't work for mine. I'm reluctant to try the 3rd potential c solution. Currently I've got it set seemingly the same as in quickmapr (2) but it is still failing with The following packages have unmet dependencies: The comment on the end of here travis-ci/travis-ci#5215 prompted me to try removing the version dependency on rgdal in DESCRIPTION, however that seemed not to solve it either. Am I missing anything obvious ? |
I've forked to my account to see if I can sort it out, will report back if successful |
@sckott thankyou thankyou |
Hi @sckott does it seem that Edzers solution didn't work either ? Because it succeeds on appveyor does this suggest it's a specific Travis problem ? Should this stop proceeding to rOpenSci & CRAN ? I'm keen to proceed if possible. |
@andysouth Yeah, let's move on, couldn't sort it out yet, but not crucial as it seems to pass elsewhere just not on travis. Approved (see new label on issue) |
|
|
you should get an invitation to a team within ropenscilabs |
rnaturalearth provides :
https://github.com/AndySouth/rnaturalearth
www.naturalearthdata.com
Anyone who wants to use Natural Earth vector data to make a map or perform other geographical analyses.
More limited subsets of Natural Earth data are available in at least the following CRAN packages rworldmap, rworldxtra, choroplethr, maps, oce, tmap. rnaturalearth is different in that it provides a comprehensive reproducible solution for access to Natural Earth vector map data either pre-downloaded or by facilitating download. By separating data access from visualisation this package provides a resource that can be used by other visualisation packages.
devtools::check()
produce any errors or warnings? If so paste them below.One NOTE
installed size is 26.0Mb
sub-directories of 1Mb or more:
data 25.8Mb
The package is currently too big because of the fine scale (10m) data. Hadley suggested :
"I think you need to keep the data under five meg. I'd suggest that you put the fine level data a separate package - you can then make that available via a drat repo (perhaps the ROpenSci one?), or just via github."
I don't know anything about drat repos. I'm happy to take advice and modify the package(s) as recommended.
The text was updated successfully, but these errors were encountered: