You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now these eventually just do summarise(n = n()) or mutate(n = n()) at some point, but that can be very slow with many groups. We already have vec_count(), which should be much much faster than count() with many groups. We could also add some kind of vctrs primitive that works like a windowed count for add_count(), or just build on top of vec_count()'s result plus an additional call to vec_match().
We'd have to think through how weighted counts would work, maybe vec_count() needs support for a weight argument (a double vector).
Motivation is something like this, and flights isn't even that big. Roughly 55k groups here.
library(dplyr)
library(nycflights13)
bench::mark(
count(flights, dep_time, dep_delay),
vctrs::vec_count(flights[c("dep_time", "dep_delay")]),
check=FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is#> disabled.#> # A tibble: 2 × 6#> expression min median itr/s…¹#> <bch:expr> <bch:tm> <bch:t> <dbl>#> 1 count(flights, dep_time, dep_delay) 419.6ms 441.4ms 2.27#> 2 vctrs::vec_count(flights[c("dep_time", "dep_delay")]) 17.3ms 21.5ms 42.7 #> # … with 2 more variables: mem_alloc <bch:byt>, `gc/sec` <dbl>, and abbreviated#> # variable name ¹`itr/sec`
Also need to handle the fact that ... and wt are data-masking, probably with add_computed_columns() like distinct().
The text was updated successfully, but these errors were encountered:
Right now these eventually just do
summarise(n = n())
ormutate(n = n())
at some point, but that can be very slow with many groups. We already havevec_count()
, which should be much much faster thancount()
with many groups. We could also add some kind of vctrs primitive that works like a windowed count foradd_count()
, or just build on top ofvec_count()
's result plus an additional call tovec_match()
.We'd have to think through how weighted counts would work, maybe
vec_count()
needs support for a weight argument (a double vector).Motivation is something like this, and flights isn't even that big. Roughly 55k groups here.
Also need to handle the fact that
...
andwt
are data-masking, probably withadd_computed_columns()
likedistinct()
.The text was updated successfully, but these errors were encountered: