Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0-length recycling with shape_broadcast() #299

Closed
DavisVaughan opened this issue Apr 24, 2019 · 1 comment · Fixed by #416
Closed

0-length recycling with shape_broadcast() #299

DavisVaughan opened this issue Apr 24, 2019 · 1 comment · Fixed by #416

Comments

@DavisVaughan
Copy link
Member

This issue was originally mentioned in Slack.

Currently, you can shape_broadcast() from having 1 column down to having 0 columns. I am unsure if we want this behavior.

library(vctrs)
#> Registered S3 method overwritten by 'vctrs':
#>   method                   from 
#>   as.character.rlang_error rlang

x <- matrix(1)
x
#>      [,1]
#> [1,]    1

y <- matrix(numeric(), nrow = 1, ncol = 0)
y
#>     
#> [1,]

# we lose the columns?
# we went "down" in dimension
vctrs:::shape_broadcast(x, y)
#>     
#> [1,]

# vec slice can "recycle" down to 0 rows
vctrs::vec_slice(x, 0)
#>      [,1]
@DavisVaughan
Copy link
Member Author

DavisVaughan commented May 1, 2019

I thought about this a bit more, and compared with both numpy and xtensor. I think numpy does the correct thing. It consistently applies broadcasting rules in such a way that having 0 as a dimension is not a special case. I have fully laid out my thoughts here xtensor-stack/xtensor#1562 if you are curious

The tl;dr is that vctrs does not quite do the right thing yet, if we want to consistently follow broadcasting rules for the shape.


The crux of the problem can be represented in two questions (below, treat the shape in the vctrs sense, all of the dimensions except the first):

What should the common dimensions of the two shapes (1, 1) and (0, 1) be? vctrs treats the common dimension as (0, 1) because a 0 is present, and numpy would also say the common dimension is (0, 1), but it does so by applying broadcasting rules consistently (a 1 was present, so the other dimension is used, which happens to be 0).

What should the common dimensions of the two shapes (2, 1) and (0, 1) be? vctrs treats the common dimension as (0, 1) because a 0 is present, BUT numpy would error because this breaks broadcasting rules. The first dimensions are 2 and 0, which can't be broadcast because they aren't the same and neither are 1.


Here is the vctrs behavior:

library(vctrs)

x <- array(1, c(1, 1, 1))
y <- array(1, c(1, 0, 1))

# shapes of (1, 1) and (0, 1) -> common shape (0, 1)
# this looks right
vctrs:::shape_common(x, y)
#> [1] 0 1
vctrs:::shape_broadcast(x, y)
#> , , 1
#> 
#>     
#> [1,]

z <- array(1, c(1, 2, 1))

# shapes of (2, 1) and (0, 1) -> error
# this is not what happens
# instead we get shape of (0, 1)
vctrs:::shape_common(z, y)
#> [1] 0 1
vctrs:::shape_broadcast(z, y)
#> , , 1
#> 
#>     
#> [1,]

Created on 2019-05-01 by the reprex package (v0.2.1.9000)


If you want to fix this issue, I think it requires changes to shape_broadcast() by removing the dim_to == 0 line https://github.com/r-lib/vctrs/blob/master/R/shape.R#L43

It would also require a tweak to shape_common(). It currently uses vec_size2() which special cases the time when either size is 0. Either this special case could be removed, or, since that is also used in vec_size_common() and you might not want these rules for size, a separate version could be written for shape_common()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant