You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All the data manipulations performed by ghcnd_splitvars() can be very slow. In my machine, processing one station id can take almost 2 seconds:
library(rnoaa)
station<- ghcnd_stations()
station<- ghcnd(station$id[1])
system.time(ghcnd_splitvars(station))
#> user system elapsed #> 1.941 0.004 1.948
The culprit is all these dplyr manipulations and tidyr::gather() calls that are somewhat redundant. I experimented a little using data.table::melt() and got dramatically better performance:
library(rnoaa) # Using eliocamp/rnoaa@dt-ghcnd_splitvarsstation<- ghcnd_stations()
station<- ghcnd(station$id[1])
bench::mark(new= ghcnd_splitvars(station),
old=rnoaa:::ghcnd_splitvars2(station))
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.#> # A tibble: 2 x 6#> expression min median `itr/sec` mem_alloc `gc/sec`#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>#> 1 new 58.13ms 60.75ms 14.7 4.64MB 1.84#> 2 old 1.93s 1.93s 0.519 12.16MB 2.08
All the data manipulations performed by
ghcnd_splitvars()
can be very slow. In my machine, processing one station id can take almost 2 seconds:The culprit is all these dplyr manipulations and
tidyr::gather()
calls that are somewhat redundant. I experimented a little usingdata.table::melt()
and got dramatically better performance:Created on 2020-06-04 by the reprex package (v0.3.0)
So, from 2s to less than 60ms!
If you like, I can create a PR with the change (minus
ghcnd_splitvars2
, of course).The text was updated successfully, but these errors were encountered: