Recycling rules #13

hadley · 2018-07-24T13:31:26Z

i.e. only recycle vectors of length 1 to length of longest.

Describe the rules for recycling vectors of length 0. @jimhester did we discuss those rules for glue?

hadley · 2018-07-24T15:42:31Z

Base R is all over the place, so I don't think we can get much inspiration from there:

# cbind() and rbind() silently drops
cbind(x = 1, y = numeric(0))
#>      x
#> [1,] 1
rbind(x = 1, y = numeric(0))
#>   [,1]
#> x    1

# data.frame() errors
data.frame(x = 1, y = numeric(0))
#> Error in data.frame(x = 1, y = numeric(0)): arguments imply differing number of rows: 1, 0

# infix operators recycle to length 0
1 + numeric(0)
#> numeric(0)
TRUE & logical()
#> logical(0)

# paste recycles to longest
paste(character(), c("x", "y"), sep = ".")
#> [1] ".x" ".y"

It seems to me like it would be safest to make recycling a length-0 vector an error.

hadley · 2018-07-24T15:48:21Z

The advantage of silently recycling to length zero is code like this:

xs <- list(integer(0), 1L, 1:3)
lapply(xs, function(x) tibble::tibble(x = x, y = 1))

Which would need to become:

xs <- list(integer(0), 1L, 1:3)
lapply(xs, function(x) tibble::tibble(x = x, y = rep_along(1, x)))

hadley · 2018-07-25T14:37:54Z

Places where this arises:

tibble::tibble()
glue::glue()
purrr::map2(), purrr::pmap()

krlmlr · 2018-08-02T04:50:10Z

Motivation for recycling to length zero: length-one columns (in the presence of columns of other length) often contain constant values which aren't important to keep if the non-length-one data is discarded.

How about providing recycling helpers?

library(tidyverse)
tibble(!!!recycle_pure(x = 1, y = character()))
#> # A tibble: 0 x 2
#> # ... with 2 variables: x <dbl>, y <chr>

tibble(!!!recycle_one_or_longest(x = 1, y = character()))
#> # A tibble: 1 x 2
#>       x y    
#>   <dbl> <chr>
#> 1     1 <NA>

tibble(!!!recycle_safe(x = 1, y = character()))
# Error

Created on 2018-08-02 by the reprex package (v0.2.0).

lionel- · 2018-08-24T13:31:45Z

I think recycling to length zero can be seen as one aspect of a typed form of "nil punning" for vectors in R. Punning is a lisp idiom that reduces the need to deal with edge cases by returning an infectious sentinel object whenever a proper return value can't be computed. The sentinel object propagates upward throughout the computation tree, until some caller substitutes a proper value or throws an error. With nil punning the type of the sentinel is always nil but with R vectors the type may vary and coercion semantics apply.

Here is an example that takes advantage of the infectiousness of empty vectors. Say we are taking an input vector of variable length, which might be empty, and we'd like the output to be 1 element longer, with various operations on the variable part of the output vector:

function(x) {
  n <- length(x) + 1
  out <- vector("list", n)

  out[[1]] <- "first"

  # Pun: `seq2()` returns an empty vector if it can't compute a normal value
  index <- rlang::seq2(2, n)

  # Pun: `[<-` is a noop rather than an error if `index` is empty.
  # In other words the RHS (which in this case is also empty but
  # could be a scalar) is recycled to length 0.
  out[index] <- lapply(x, `/`, 100)

  # Pun: order() returns `NULL` if `x` is empty
  order <- order(x)

  # Pun: `order + 1` is empty if `order` is empty.
  # This is recycling to length 0.
  order <- order + 1

  # Pun: Can safely reorder vector because c(1, empty) is a noop (except for the type)
  out <- out[c(1, order)]

  out
}

hadley · 2018-08-27T12:44:59Z

This should probably be a component of a bigger discussion about vectorisation: how to do it, and how to document it.

DavisVaughan · 2019-06-11T19:48:59Z

This comment holds what I think are the correct rules:

tidyverse/tibble#435 (comment)

(The additional comments seem to imply that there is agreement here, but I wanted to write them down)

This exactly matches the broadcasting rules in rray, which were derived from xtensor/NumPy. Essentially when comparing two objects the recycling rules are:

If the lengths of both are equivalent, do nothing
If the length of one object is 1, recycle that to the length of the other object
Otherwise, error

I am a big fan of the fact that zero-length objects are not special cased here, and have the following implications:

Common length of 2 and 2 is 2
Common length of 2 and 1 is 2
Common length of 2 and 0 is an error
Common length of 0 and 1 is 0

lionel- · 2019-06-17T11:00:56Z

The current thinking about tidyverse rules is that the common size of n and 0 is 0, not an error. The xtensor broadcasting rules are interesting. However making 0 a normal case rather than a special case doesn't seem sufficient justification for changing the rules. These rules need to be justified in terms of the consequences for users and programmers. There are two goals to balance: maximising the composability of vectorised functions, and producing intuitive behaviour for users.

As argued above, n -> 0 recycling might be viewed as "empty punning", similar to nil punning in lisp. We use an empty vector to represent an absence of correct value with type information, and turn the current computation into a no-op instead of throwing an error. The goal of punning is to reduce the number of edge cases that a programmer has to deal with. It can lead to surprising behaviour, but lisp shows that if done well it is a net gain for the practice of programming. However, for vector manipulations in R, empty punning might be more surprising than helpful.

One clear example of helpful empty punning is rlang::seq2(), which returns an empty vector when no increasing sequence can be computed. This composes well with looping over vector components because it reduces an indexing operation to a no-op when the inputs are outside the boundary conditions. This behaviour is both useful and intuitive. However, while this is a case of empty punning, this is not an instance of recycling.

Can we come up with valid uses of empty punning for data manipulation? One interesting recycling case in vctrs is vec_slice<- (and [<- in base R) where the RHS is recycled with the index. 1 -> 0 definitely helps because it makes sense to be size-agnostic when using vectors of length 1:

x[my_index] <- 1

If my_index is empty, we ignore the RHS. If it isn't, we recycle it to full length. This use case is consistent with Kirill's and Hadley's examples above where the motivation of 1 -> 0 recycling is combining vectors of arbitrary lengths with optional constants. What about n -> 0 recycling? It doesn't seem to make sense here:

x[my_index] <- 1:3

Why would we expect my_index to be either size 0 or exactly size 3? This seems like a strange combination of assumptions to make. And we don't need to make such assumption to effectively use empty punning:

# In case of empty punning, RHS is empty and there is no need for `n -> 0` recycling:
idx <- seq2(start, from)
x[idx] <- y[idx]

Similarly, I don't see what could be practical purposes of n -> 0 recycling in these cases:

1:3 + int()
#> integer(0)

purrr::map2(1:5, int(), ~ list(x, y))
#> list()

Overall I'd tend to agree that the broadcasting rules are the most obviously intuitive, and that they might help to catch data manipulation errors early. I'm wary that we might prohibit valid composition idioms that we have not considered here, but I couldn't come up with any clear idiom or pattern. Chances are that such idioms, if they exist, would be obscure and hard to read.

krlmlr · 2019-06-17T12:40:10Z

We currently have:

glue::glue("{1:3}, {1} and {integer()} are recyclable")
tibble::tibble(1:3, 1, integer())
#> Tibble columns must have consistent lengths, only values of length one are recycled:
#> * Length 0: Column `integer()`
#> * Length 3: Column `1:3`

^{Created on 2019-06-17 by the reprex package (v0.3.0)}

I'd argue both behaviors are surprising in different ways.

The missing message in glue() might just go unnoticed
Users might have expectations about behavior of zero-length vectors

What's the safer behavior?

Throwing an error is more conservative, and could be relaxed to recycling towards zero later if really necessary. It's also a simpler rule compared to adding 0 as a special case.

For subset assignment, the length of the vectors must match, or the RHS must have length one. x[integer()] <- 1:3 should throw an error.

DavisVaughan · 2019-06-17T13:51:30Z

I completely agree with @krlmlr here

(update i now see that @lionel-'s example is in agreement with @krlmlr's comment)

lionel- · 2019-06-17T14:08:42Z

FTR Kirill's post and mine are in agreement.

hadley · 2019-06-18T13:20:25Z

To be clear, you are all arguing that there should be no common size for integer(2) and integer(0), right? (And that the common size of integer(1) and integer(0) is zero, because vectors of length 1 can be recycled to any length)

lionel- · 2019-06-18T13:33:07Z

Right, instead of there being two rules (0 size swallows all sizes, 1 is recycled to longest), a single rule might be better (1 is recycled to any other size), unless we find good patterns for the zero-eats-all rule.

I think at least the glue package is depending on full recycling to 0, cc @jimhester.

jimhester · 2019-06-19T13:30:23Z

Make sense to me

lionel- · 2019-07-05T08:05:03Z

FYI we have started using the new rules in vctrs 0.2.0. Length-zero vectors can only be combined with length-zero and length-one vectors.

jennybc · 2019-07-05T16:22:42Z

Didn't ? @DavisVaughan ? state the vctrs recycling rules really crisply in some thread, maybe in a nice-looking table? That would be nice to copy/paste here, if true.

DavisVaughan · 2019-07-06T12:21:15Z

This is the image used in ?vec_recycle now. Notable to keep in mind is that m = 0 is valid here, and the rules still apply. 1 is the only special case.

hadley mentioned this issue Jul 24, 2018

Consider recycling rules when column lengths are mix of 0 and 1 tidyverse/tibble#435

Closed

hadley mentioned this issue Oct 5, 2018

Should set_names() recycle length-1 names vectors? r-lib/rlang#325

Closed

hadley mentioned this issue Feb 19, 2019

NULL vs zero-length vectors #24

Open

hadley added the interface 🎁 External interface of functions label Feb 19, 2019

hadley mentioned this issue May 27, 2019

case_when should return an error with zero length RHS values tidyverse/dplyr#4170

Closed

DavisVaughan mentioned this issue Jun 14, 2019

Use 0-size tidyverse recycling rules consistently r-lib/vctrs#416

Merged

lionel- mentioned this issue Oct 15, 2019

pmap returns output length zero if at least one element is length 0 tidyverse/purrr#695

Closed

nathaneastwood mentioned this issue Mar 29, 2021

Empty grouping levels in the "groups" attribute tidyverse/dplyr#5830

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recycling rules #13

Recycling rules #13

hadley commented Jul 24, 2018

hadley commented Jul 24, 2018 •

edited

Loading

hadley commented Jul 24, 2018 •

edited

Loading

hadley commented Jul 25, 2018

krlmlr commented Aug 2, 2018

lionel- commented Aug 24, 2018

hadley commented Aug 27, 2018

DavisVaughan commented Jun 11, 2019 •

edited

Loading

lionel- commented Jun 17, 2019 •

edited

Loading

krlmlr commented Jun 17, 2019

DavisVaughan commented Jun 17, 2019 •

edited

Loading

lionel- commented Jun 17, 2019

hadley commented Jun 18, 2019

lionel- commented Jun 18, 2019

jimhester commented Jun 19, 2019

lionel- commented Jul 5, 2019

jennybc commented Jul 5, 2019

DavisVaughan commented Jul 6, 2019 •

edited

Loading

Recycling rules #13

Recycling rules #13

Comments

hadley commented Jul 24, 2018

hadley commented Jul 24, 2018 • edited Loading

hadley commented Jul 24, 2018 • edited Loading

hadley commented Jul 25, 2018

krlmlr commented Aug 2, 2018

lionel- commented Aug 24, 2018

hadley commented Aug 27, 2018

DavisVaughan commented Jun 11, 2019 • edited Loading

lionel- commented Jun 17, 2019 • edited Loading

krlmlr commented Jun 17, 2019

DavisVaughan commented Jun 17, 2019 • edited Loading

lionel- commented Jun 17, 2019

hadley commented Jun 18, 2019

lionel- commented Jun 18, 2019

jimhester commented Jun 19, 2019

lionel- commented Jul 5, 2019

jennybc commented Jul 5, 2019

DavisVaughan commented Jul 6, 2019 • edited Loading

hadley commented Jul 24, 2018 •

edited

Loading

hadley commented Jul 24, 2018 •

edited

Loading

DavisVaughan commented Jun 11, 2019 •

edited

Loading

lionel- commented Jun 17, 2019 •

edited

Loading

DavisVaughan commented Jun 17, 2019 •

edited

Loading

DavisVaughan commented Jul 6, 2019 •

edited

Loading