-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recycling rules #13
Comments
Base R is all over the place, so I don't think we can get much inspiration from there: # cbind() and rbind() silently drops
cbind(x = 1, y = numeric(0))
#> x
#> [1,] 1
rbind(x = 1, y = numeric(0))
#> [,1]
#> x 1
# data.frame() errors
data.frame(x = 1, y = numeric(0))
#> Error in data.frame(x = 1, y = numeric(0)): arguments imply differing number of rows: 1, 0
# infix operators recycle to length 0
1 + numeric(0)
#> numeric(0)
TRUE & logical()
#> logical(0)
# paste recycles to longest
paste(character(), c("x", "y"), sep = ".")
#> [1] ".x" ".y" It seems to me like it would be safest to make recycling a length-0 vector an error. |
The advantage of silently recycling to length zero is code like this: xs <- list(integer(0), 1L, 1:3)
lapply(xs, function(x) tibble::tibble(x = x, y = 1)) Which would need to become: xs <- list(integer(0), 1L, 1:3)
lapply(xs, function(x) tibble::tibble(x = x, y = rep_along(1, x))) |
Places where this arises:
|
Motivation for recycling to length zero: length-one columns (in the presence of columns of other length) often contain constant values which aren't important to keep if the non-length-one data is discarded. How about providing recycling helpers? library(tidyverse)
tibble(!!!recycle_pure(x = 1, y = character()))
#> # A tibble: 0 x 2
#> # ... with 2 variables: x <dbl>, y <chr>
tibble(!!!recycle_one_or_longest(x = 1, y = character()))
#> # A tibble: 1 x 2
#> x y
#> <dbl> <chr>
#> 1 1 <NA>
tibble(!!!recycle_safe(x = 1, y = character()))
# Error
|
I think recycling to length zero can be seen as one aspect of a typed form of "nil punning" for vectors in R. Punning is a lisp idiom that reduces the need to deal with edge cases by returning an infectious sentinel object whenever a proper return value can't be computed. The sentinel object propagates upward throughout the computation tree, until some caller substitutes a proper value or throws an error. With nil punning the type of the sentinel is always nil but with R vectors the type may vary and coercion semantics apply. Here is an example that takes advantage of the infectiousness of empty vectors. Say we are taking an input vector of variable length, which might be empty, and we'd like the output to be 1 element longer, with various operations on the variable part of the output vector: function(x) {
n <- length(x) + 1
out <- vector("list", n)
out[[1]] <- "first"
# Pun: `seq2()` returns an empty vector if it can't compute a normal value
index <- rlang::seq2(2, n)
# Pun: `[<-` is a noop rather than an error if `index` is empty.
# In other words the RHS (which in this case is also empty but
# could be a scalar) is recycled to length 0.
out[index] <- lapply(x, `/`, 100)
# Pun: order() returns `NULL` if `x` is empty
order <- order(x)
# Pun: `order + 1` is empty if `order` is empty.
# This is recycling to length 0.
order <- order + 1
# Pun: Can safely reorder vector because c(1, empty) is a noop (except for the type)
out <- out[c(1, order)]
out
} |
This should probably be a component of a bigger discussion about vectorisation: how to do it, and how to document it. |
This comment holds what I think are the correct rules: tidyverse/tibble#435 (comment) (The additional comments seem to imply that there is agreement here, but I wanted to write them down) This exactly matches the broadcasting rules in rray, which were derived from xtensor/NumPy. Essentially when comparing two objects the recycling rules are:
I am a big fan of the fact that zero-length objects are not special cased here, and have the following implications:
|
The current thinking about tidyverse rules is that the common size of As argued above, One clear example of helpful empty punning is Can we come up with valid uses of empty punning for data manipulation? One interesting recycling case in vctrs is x[my_index] <- 1 If x[my_index] <- 1:3 Why would we expect # In case of empty punning, RHS is empty and there is no need for `n -> 0` recycling:
idx <- seq2(start, from)
x[idx] <- y[idx] Similarly, I don't see what could be practical purposes of 1:3 + int()
#> integer(0)
purrr::map2(1:5, int(), ~ list(x, y))
#> list() Overall I'd tend to agree that the broadcasting rules are the most obviously intuitive, and that they might help to catch data manipulation errors early. I'm wary that we might prohibit valid composition idioms that we have not considered here, but I couldn't come up with any clear idiom or pattern. Chances are that such idioms, if they exist, would be obscure and hard to read. |
We currently have: glue::glue("{1:3}, {1} and {integer()} are recyclable")
tibble::tibble(1:3, 1, integer())
#> Tibble columns must have consistent lengths, only values of length one are recycled:
#> * Length 0: Column `integer()`
#> * Length 3: Column `1:3` Created on 2019-06-17 by the reprex package (v0.3.0) I'd argue both behaviors are surprising in different ways.
What's the safer behavior? Throwing an error is more conservative, and could be relaxed to recycling towards zero later if really necessary. It's also a simpler rule compared to adding 0 as a special case. For subset assignment, the length of the vectors must match, or the RHS must have length one. |
FTR Kirill's post and mine are in agreement. |
To be clear, you are all arguing that there should be no common size for |
Right, instead of there being two rules (0 size swallows all sizes, 1 is recycled to longest), a single rule might be better (1 is recycled to any other size), unless we find good patterns for the zero-eats-all rule. I think at least the glue package is depending on full recycling to 0, cc @jimhester. |
Make sense to me |
FYI we have started using the new rules in vctrs 0.2.0. Length-zero vectors can only be combined with length-zero and length-one vectors. |
Didn't ? @DavisVaughan ? state the vctrs recycling rules really crisply in some thread, maybe in a nice-looking table? That would be nice to copy/paste here, if true. |
i.e. only recycle vectors of length 1 to length of longest.
Describe the rules for recycling vectors of length 0. @jimhester did we discuss those rules for glue?
The text was updated successfully, but these errors were encountered: