Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JOSS Review: Improve documentation on geo-processing steps and its effects on the original geometries #53

Closed
6 tasks done
Jo-Schie opened this issue Jul 23, 2022 · 12 comments

Comments

@Jo-Schie
Copy link

Jo-Schie commented Jul 23, 2022

Hi @jeffreyhanson .

I think it could be helpful for users to extend a little the documentation on the internal geo-processing steps and what they actually do to the data (geometries). This should be part of the vignette, but also of the paper (which so far very much focuses very much on describing the need, but not the internals of the package).

From my first test, there are a few things that should be discussed:

  • Overview of the applied geo-processing steps in wdpa_clean. -> maybe this could be also a graphical representation.
  • which geo-processing steps from wdpa_clean affect the original geometries?
  • How do they affect the geometries and what are consequences for the use (e.g. question of scale)?
  • How do default settings affect geometries (discuss that default is country scale analysis)?
  • How can we fine-tune settings e.g. to work on different scales and what are the effects on the geometries?
  • How are overlapping polygon boundaries resolved and what are the consequences for analysis -> e.g. if there are two overlapping areas with different IUCN categories, how is the boundary resolved and what are the consequences for calculating area statistics.

If possible, I would recommend that you use interactive maps in your vignette instead of the map that you plotted (which does not allow to really see anything). Below you can find some sample code and some screenshots that might allow users to understand more intuitively what is done with the data. You could use and maybe extent this a little bit.

# ----- reproduce quickstart tutorial from https://github.com/prioritizr/wdpar -----
# load packages
library(wdpar)
library(dplyr)
library(ggmap)

# download protected area data for Malta
mlt_raw_pa_data <- wdpa_fetch("Malta",
                              wait = TRUE,
                              download_dir = rappdirs::user_data_dir("wdpar"))

# clean Malta data
mlt_pa_data <- wdpa_clean(mlt_raw_pa_data)

# reproject data to longitude/latitude for plotting
mlt_pa_data <- st_transform(mlt_pa_data, 4326)

# download basemap imagery
bg <- get_stamenmap(unname(st_bbox(mlt_pa_data)),
                    zoom = 8,
                    maptype = "watercolor",
                    force = TRUE)

# make map
ggmap(bg) +
  geom_sf(aes(fill = IUCN_CAT), data = mlt_pa_data, inherit.aes = FALSE) +
  theme(axis.title = element_blank(), legend.position = "bottom")

# ----- interactive map to compare raw to processed data -----
library(mapview)

mapView(mlt_pa_data) + mapView(mlt_raw_pa_data, col.regions = "red")

# ----- compare higher precision processing -----
# problem: Data with the default setting gets quite distorted on local level
# (see mapview before)

# clean Malta data with higher geomtry precision
mlt_pa_data_highprecision <- wdpa_clean(mlt_raw_pa_data,
                                        geometry_precision = 10000)


mapView(mlt_pa_data) +
  mapView(mlt_raw_pa_data, col.regions = "red") +
  mapView(mlt_pa_data_highprecision, col.regions = "green")


# clean Malta data with higher geomtry precision and keep overlaps
mlt_pa_data_highprecision_overlap <- wdpa_clean(mlt_raw_pa_data,
                                                geometry_precision = 10000,
                                                erase_overlaps = FALSE)


mapView(mlt_pa_data) +
  mapView(mlt_raw_pa_data, col.regions = "red") +
  mapView(mlt_pa_data_highprecision, col.regions = "green")+
  mapView(mlt_pa_data_highprecision_overlap, col.regions = "purple")

Original Data
image

Default Settings

image

Higher Geometry precision
image

HIgher geometry precision and overlaps
image

Link to review

@jeffreyhanson
Copy link
Collaborator

Thank you very much for these suggestions @Jo-Schie! I'll have a think about this and begin incorperating your feedback later this week.

@jeffreyhanson
Copy link
Collaborator

Sorry, I've been swamped and am only getting to this today.

@jeffreyhanson
Copy link
Collaborator

Thanks again for providing all these suggestions @Jo-Schie! I've created a PR with all the changes to incorporate your feedback (see #57). To help make it clear exactly how I've addressed each of your points, I've responded to each of them below (separately) and copied in the updated text from the PR. Since JOSS policies favor shorter publications [1], I've tried to address several of the comments in the same sentence or paragraph. I think I've responded to all your points, but please let me know if I have missed anything?

  1. Overview of the applied geo-processing steps in wdpa_clean. -> maybe this could be also a graphical representation.

    I have deliberately written the paper to avoid describing the internals of the package in a high level of detail. This is because the WDPA and WDOECM will likely be updated at some point in the future which, in turn, would require changes to the package. If I updated the manuscript to describe the data cleaning procedures -- as they are currently implemented -- in a high level of detail and the package had to be updated in the future, this would cause the published paper to be incorrect. Indeed, I fear that a discrepancy between the published paper and the package documentation would be very confusing for users. Although I appreciate your feedback, I don't think this suggestion would improve the manuscript.

  2. which geo-processing steps from wdpa_clean affect the original geometries?

    I have updated the manuscript to make it clear which steps involve geoprocessing. It now includes the following sentences.

    "These procedures include excluding areas that have yet to be fully implemented, areas that are no longer designated, and UNESCO Biosphere Reserves [@r1]. They also include geoprocessing procedures, such as repairing invalid geometries in spatial boundaries, buffering areas represented by point localities [@r3], and removing spatial overlaps [@r2]."

  3. How do they affect the geometries and what are consequences for the use (e.g. question of scale)?

    I have updated the manuscript to describe the consequences of using different levels of precision at different scales. Specifically, it now contains the following sentences.

    "For example, the precision of spatial data processing procedures can be increased so that they are suitable for local scale analyses. This is especially important because the default precision may remove smooth edges at fine scales."

  4. How do default settings affect geometries (discuss that default is country scale analysis)?

    I have updated the manuscript to describe how the default settings affect the geometries, and note how the default settings are for national-scale analyses. It now includes the following sentences.

    "Although the default settings for the data cleaning procedures are well-suited for national scale reporting of protected area coverage, they can be customized for other applications. For example, the precision of spatial data processing procedures can be increased so that they are suitable for local scale analyses. This is especially important because the default precision may remove smooth edges at fine scales."

  5. How can we fine-tune settings e.g. to work on different scales and what are the effects on the geometries?

    I have updated the manuscript to mention that settings can be fine tuned to work on different scales (please see text mentioned in previous comment). Because JOSS policies state that the manuscript should not describe API functionality [2], I have not updated the manuscript to mention the specific function or parameter that should be used. Instead, I have provided this information in the vignette, with the following sentences.

    "Please note that, by default, spatial data processing is performed at a scale suitable for national scale analyses. If you require data suitable for fine scale analysis, please a higher level of precision when cleaning the data (e.g., using the augment: geometry_precision = 10000)."

  6. How are overlapping polygon boundaries resolved and what are the consequences for analysis -> e.g. if there are two overlapping areas with different IUCN categories, how is the boundary resolved and what are the consequences for calculating area statistics.

    I have updated the manuscript to provide this information. It now contains the following sentences.

    "Specifically, overlapping geometries are erased such that areas associated with more effective management categories or have historical precedence are retained."

  7. I would recommend that you use interactive maps in your vignette instead of the map that you plotted

    Thank you for this suggestion! Although I appreciate that this would help with data visualization, I do not believe this would improve the overall user experience. This is because I would need to add the mapview (or leaflet) R package as a dependency which, in turn, would -- because the mapview R package has many, many dependencies (see https://cran.r-project.org/package=mapview) -- result in excessive installation times and a needlessly large installation footprint for users that wish to try out the package. I strongly believe that it is my responsibility to ensure that the software I write does not needlessly pollute my users' computers.

[1] JOSS policies state that the manuscript should be a "short paper" and less than 1000 words (https://joss.readthedocs.io/en/latest/submitting.html)

[2] JOSS policies state the "the paper should not include software documentation such as API (Application Programming Interface) functionality, as this should be outlined in the software documentation" (https://joss.readthedocs.io/en/latest/review_criteria.html).

@Jo-Schie
Copy link
Author

Jo-Schie commented Aug 9, 2022

Hi @jeffreyhanson : Thanks for the edits.

Some quick remarks:
as to6)
you sate that "Specifically, overlapping geometries are erased such that areas associated with more effective management categories or have historical precedence are retained."

-> What if there is an overlap of "old area/soft protection" with "new area/strong protection"? Is there some decision tree? If so it would be useful to illustrate it or link to the relevant literature that suggests the decision hierarchy at this specific point in the manuscript.

As to 1).
I think a short description of the processing steps should be somewhere. I can follow your argument that you do not want to put it into the paper because of eventual changes to the data and the associated processing steps... but then I would put it in the package vignette and reference from the paper to it. By this you can keep it updated. Otherwise it is not clear to users what really happens to the data in the background, unless they are willing and able to understand your R-code inside the package. The transparency though is a necessary precondition to use the package for scientific analysis.

As to 7)
From my understanding you do not need mapview as a dependency in your package if you create the vignette with it. the Vignette in the end is the rendered html version of your rmarkdown document... are you sure you would need mapview as a package dependency if you only use it in the vignette?

I think an alternative could be that you include some graphics in your documentation (as in my comment above) that show how geometries are affected by different settings in the package. Nevertheless I think an interactive map is much better for the user to explore what happens then some graphics.

I am thinking of this as a user who is interested in precise area information and I myself used your package with the default settings for quite some time without really checking what happened to the areas (because R is also not very convenient for that). The package vignette might be a great place to make users aware of that since it is the most relevant document that users of your package reference to.

@jeffreyhanson
Copy link
Collaborator

jeffreyhanson commented Aug 10, 2022

@Jo-Schie - thanks for getting back to me so quickly, I really appreciate it!

I'll respond to each of your follow up comments below (seperately).

  • What if there is an overlap of "old area/soft protection" with "new area/strong protection"? Is there some decision tree? If so it would be useful to illustrate it or link to the relevant literature that suggests the decision hierarchy at this specific point in the manuscript.

    Yeah, the overlaps are handled such that areas with "weaker protection" are erased, and in case of ties -- when there are two/multiple overlapping protected areas with the same protection level -- the younger areas are erased. I remember that Runge et al. (2015) handled overlaps in a similar manner, wherein overlapping parts of protected areas with a weaker protection were erased. I've re-written the sentence to make this clearer and cited Runge et al. (2015). It now reads as follows.

    "Specifically, overlapping geometries are erased such that areas associated with more effective management categories are retained [r22] and -- in cases where geometries with the same management category overlap -- areas associated with historical precedence are retained."

  • I think a short description of the processing steps should be somewhere. I can follow your argument that you do not want to put it into the paper because of eventual changes to the data and the associated processing steps... but then I would put it in the package vignette and reference from the paper to it. By this you can keep it updated. Otherwise it is not clear to users what really happens to the data in the background, unless they are willing and able to understand your R-code inside the package. The transparency though is a necessary precondition to use the package for scientific analysis.

    I completely agree that it is important to describe the data cleaning procedures in detail. I concur, it is indeed a necessary precondition to use the package for scientific analysis. Although the procedures are not fully described in the vignette, the documentation for the wdpa_clean() function describes the procedures in a high level of detail (see https://prioritizr.github.io/wdpar/reference/wdpa_clean.html).

    Specifically, the data cleaning procedures (currently) comprise 20 individual steps. For each of these steps, the documentation describes (where relevant) the underlying function(s) (e.g., step 6 uses the sf::st_wrap_dateline() function); parameters that alter the behavior of the procedures (e.g., step 3 involves removing UNESCO Biosphere Reserves, and the documentation says that the exclude_unesco parameter can be used to avoid this); relevant columns (in attribute table) of the data (e.g., step 9 buffers points by their reported area, and the documentation says which column in the data contains the reported area values); and relevant citations (e.g., step 3 involves removing UNESCO Biosphere Reserves, and Coetzer et al. 2014 is cited to provide readers with information on why this is important).

    Since the data cleaning procedures are fully described in the function documentation, I do not think repeating this information in vignette would improve the package documentation. Indeed, I worry that providing such a large amount of dense information in the vignette would be confusing, distracting, and intimidating for new users. This is because the vignette aims to provide a brief introduction -- as indicated by the "Getting started" header in the package website -- rather than a comprehensive manual. How does that sound?

  • From my understanding you do not need mapview as a dependency in your package if you create the vignette with it. the Vignette in the end is the rendered html version of your rmarkdown document... are you sure you would need mapview as a package dependency if you only use it in the vignette?

    Yeah, I'm fairly sure. Specifically, the mapview package would need to be listed as an optional dependency in the Suggests field of the DESCRIPTION file. This is based on the official "Writing R Extensions" document -- describing best practices and criteria for CRAN publication -- which states that "The 'Suggests' field uses the same syntax as 'Depends' and lists packages that are not necessarily needed. This includes packages used only in examples, tests or vignettes (see Writing package vignettes), and packages loaded in the body of functions". Even though the dependency is optional, users would still need to download the mapview package -- and all of its many, many dependencies -- to run the vignette locally. Although it would be technically possible to add interactive maps and not list the mapview package as a dependency, this would violate CRAN policies and mean that I can no longer make the package available on CRAN---which I believe is not a desirable outcome. As such, I maintain that adding interactive maps to the package documentation would needlessly pollute users' computers and degrade their overall experience. Is that reasonable?

  • I think an alternative could be that you include some graphics in your documentation (as in my comment above) that show how geometries are affected by different settings in the package. Nevertheless I think an interactive map is much better for the user to explore what happens then some graphics. I am thinking of this as a user who is interested in precise area information and I myself used your package with the default settings for quite some time without really checking what happened to the areas (because R is also not very convenient for that). The package vignette might be a great place to make users aware of that since it is the most relevant document that users of your package reference to.

    Thanks, I really like that idea! I've updated the vignette to include a new section that examines the consequences of using the default parameters for a local scale analysis (titled "Recommended practices for local scale analyses", please see attached html file). The new section provides recommendations for local scale analyses (i.e., suggesting which parameter should be customized), shows how the default parameters can result in protected area boundaries that are not suitable for such analyses, and shows how customizing the parameters can address this issue. As you suggest, the vignette uses graphics -- specifically, static maps -- to help readers see clearly why it is important to customize the spatial precision of the data cleaning procedures for local scale analyses. What do you think?

References

Coetzer KL, Witkowski ET, & Erasmus BF (2014) Reviewing Biosphere Reserves globally: Effective conservation action or bureaucratic label? Biological Reviews, 89: 82--104.

Runge CA, Watson JEM, Butchart HM, Hanson JO, Possingham HP & Fuller RA (2015) Protected areas and global conservation of migratory birds. Science, 350: 1255--1258.

Updated vigntte html file is available here: wdpar.zip

@jeffreyhanson
Copy link
Collaborator

@Jo-Schie, I just wanted to follow up and ask if the PR address your concerns on this issue?

@Jo-Schie
Copy link
Author

Hi @jeffreyhanson . I did not really realize that you'd already worked on this. I put it on my todo list and will come back asap.

@jeffreyhanson
Copy link
Collaborator

No worries - thanks! Please let me know if anything's not clear and I'll get back to you as soon as I can.

@jeffreyhanson
Copy link
Collaborator

@Jo-Schie, I just wanted to follow up and ask if you'd a chance to go through the PR?

@jeffreyhanson
Copy link
Collaborator

@Jo-Schie, it's been over a month since I originally created the PR to address your feedback, is there anything I can do to help make it easier for you to review it?

@Jo-Schie
Copy link
Author

hi @jeffreyhanson. Documentation looks very good to me and I fully understand that you do not want to add the mapview package if it creates a dependency for the package. Also it makes sense to have the processing steps in the function documentation and I think you do explain it in a great level of detail. Sorry for the delays. Happy to close the issue now.

@jeffreyhanson
Copy link
Collaborator

Awesome - no worries - thank you very much!

jeffreyhanson added a commit that referenced this issue Sep 28, 2022
- Update paper for JOSS submission (fix #53, fix #54, fix #62).
- Update vignette with new section on local scale analyses (fix #53).
- Update `wdpa_fetch()` function to use the webdriver package for obtaining data  (replacing Rselenium package as a dependency) (fix #63).
- Update `st_repair_geometry()` to be more robust.
- Fix failing tests for `st_repair_geometry()` function.
- Update documentation for `wdpa_clean()` function.
- Fix broken URL in vignette.
- Fix CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants