Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cell-abstraction mode #11

Open
mdsumner opened this issue Jul 14, 2017 · 6 comments
Open

cell-abstraction mode #11

mdsumner opened this issue Jul 14, 2017 · 6 comments
Assignees

Comments

@mdsumner
Copy link
Collaborator

mdsumner commented Jul 14, 2017

I'd love to see a "cell-abstraction" version of fasterize that returned the cell number of object/s, as a classified data frame i.e. cell, polygon-ID.

This is awesome because

  • overlapping objects don't have to be "resolved", they are recorded (i.e. one cell, many features)
  • the raster cells aren't returned in full necessarily, but only where they intersect a feature
  • tidy verbs for downstream raster extraction and per-cell / per-feature / per-raster-slice summary for many varied workflows ...

Here's an example of the current behaviour, multiple overlapping polygons are "resolved" by applying "sum" to the many-to-one relation:

library(fasterize)
p1 <- rbind(c(-180,-20), c(-140,55), c(10, 0), c(-140,-60), c(-180,-20))
hole <- rbind(c(-150,-20), c(-100,-10), c(-110,20), c(-150,-20))
p1 <- list(p1, hole)
p2 <- list(rbind(c(-10,0), c(140,60), c(160,0), c(140,-55), c(-10,0)))
p3 <- list(rbind(c(-125,0), c(0,60), c(40,5), c(15,-45), c(-125,0)))
library(sf)
pols <- st_sf(value = rep(1,3),
              geometry = st_sfc(lapply(list(p1, p2, p3), st_polygon)))
r0 <- raster(pols, res = 1)

## here is a literal rasterization
r <- fasterize(pols, r0, field = "value", fun="sum")

Here is the same (mostly) rasterization in "cell-abstraction" mode.

  1. bend fasterize to abstraction, instantiate the entire grid and decompose
pols$polygonID <- seq_len(nrow(pols))
## here is a cell-abstraction version, fast, because fasterize

cells0 <- dplyr::bind_rows(lapply(seq_len(nrow(pols)), 
       function(i) tibble::tibble(cell = which(!is.na(raster::values(fasterize(pols[i, , drop = FALSE ], r0, field = "polygonID", fun = "first")))), 
                                  polygonID = i)))
  1. bend raster's powerful but un-tidy cell* functions to use, this is done "sparsely" with cell-logic
## here is a cell-abstraction version, slow but slightly neater code with less wasted pixel creation
## and no need to filter out the missing values
cells1 <- dplyr::bind_rows(lapply(raster::cellFromPolygon(r0, as(pols, "Spatial")), 
                                 function(x) tibble::tibble(cell = as.integer(x))), .id = "polygon_rowid")

A comparison of the two results above. This kind of decomposition might help explore exactly what and where for #6

library(dplyr)
## resolution of elements affects the result, also f/rasterize are slightly different 
cells0 %>% group_by(polygonID) %>% summarize(finite_element_area = prod(raster::res(r)) * n())
# # A tibble: 3 x 2
# polygonID finite_element_area
# <int>               <dbl>
#   1         1               10135
# 2         2                9765
# 3         3                8700
cells1 %>% group_by(polygon_rowid) %>% summarize(finite_element_area = prod(raster::res(r)) * n())

# # A tibble: 3 x 2
# polygon_rowid finite_element_area
# <chr>               <dbl>
#   1             1               10115
# 2             2                9785
# 3             3                8700

st_area(pols)
#[1] 10125  9775  8700
  1. Finally the real goal, an in-progress generall "cellnumbers" function that would use fasterize's future "cell-abstraction", rather than as now converting back to Spatial.
## here is a tidier but still slow version (that would benefit massively from fasterize support!)

library(tabularaster)  ## devtools::install_github("hypertidy/tabularaster")
cn <- cellnumbers(r0, pols) 
cn %>% group_by(object_) %>% summarize(n())
# # A tibble: 3 x 2
# object_ `n()`
# <int> <int>
#   1       1 10115
# 2       2  9785
# 3       3  8700
# Warning message:
#   projections not the same 
# x: NA
# query: NA 

I could do the lapply trick as in 2. above for tabularaster, but I think this is immensely powerful and at any rate will be required for really massive grids, when we really don't need all those empty pixels. This kind of decomposition to data frames for rasters is a good way to flexibly implement sparse rasters, and rasters distributed over higher dimension, sparse or otherwise.

I'd like to explore the C++ for this, but for the time being this is a placeholder in case it's of interest and / or easy-ish.

Thanks!

@mdsumner
Copy link
Collaborator Author

Actually I've seen a way to do this by apply fasterize iteratively to each feature, which saves some of the cell/value instantiation (i.e. when features are small compared to the raster extent).

There's a need for cell-remapping between local global extents that's been on my mind for a while, and I think that would transcend this request as it stands - and might be pretty straightforward. This is perhaps a "wrong question" request, but they do help to get to the right question some times.

@mdsumner
Copy link
Collaborator Author

Analogous would be to return a set of poly, rownumber, xstarts and xends - it would be tiny, with no need to ever fill any raster data and allow very complex custom functions. I'll do this soonish, now I understand how it works!

@noamross
Copy link
Collaborator

So I understand, your intention here is another separate function, that would return a data frame? Within C++ it would use the Edge/Edgelist types, and the parts of the C++ rasterize function itself, but rather than running through the matrix it would return this data frame of poly, row, xstart, xend?

So, on one hand I want to minimize code duplication and we should see if we should modularize parts of the rasterize C++ function to minimize that so that you're not repeating much besides the inside of the loop.

However also, this seems like a specialty/developer-focused output, with the goal of producing something like the outputs in tabularaster. It goes a bit against my goals of keeping this package simple by not supporting anything but basic sf/raster/starts input and output types. Maybe the better way to go would be to export the C++ functions and classes via Rcpp::interfaces so that another package can use them to produce different outputs like this.

@mdsumner
Copy link
Collaborator Author

mdsumner commented Mar 24, 2018

Sounds good, it's pretty minor duplication and I can see my way to doing it now. I'll keep an eye on your API exposure and let you know how I go.

@mdsumner
Copy link
Collaborator Author

Just for the record here, I was pretty naive about what's involved, but I'm learning heaps ;)

@mdsumner
Copy link
Collaborator Author

I have done this here: https://github.com/hypertidy/controlledburn

will see about folding it back in, exposing the API, etc.

@mdsumner mdsumner self-assigned this Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants