Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ghcnd_splitvars can be faster #352

Closed
eliocamp opened this issue Jun 4, 2020 · 2 comments · Fixed by #355
Closed

ghcnd_splitvars can be faster #352

eliocamp opened this issue Jun 4, 2020 · 2 comments · Fixed by #355
Milestone

Comments

@eliocamp
Copy link
Contributor

eliocamp commented Jun 4, 2020

All the data manipulations performed by ghcnd_splitvars() can be very slow. In my machine, processing one station id can take almost 2 seconds:

library(rnoaa)

station <- ghcnd_stations()
station <- ghcnd(station$id[1])

system.time(ghcnd_splitvars(station))
#>    user  system elapsed 
#>   1.941   0.004   1.948

The culprit is all these dplyr manipulations and tidyr::gather() calls that are somewhat redundant. I experimented a little using data.table::melt() and got dramatically better performance:

library(rnoaa) # Using eliocamp/rnoaa@dt-ghcnd_splitvars

station <- ghcnd_stations()
station <- ghcnd(station$id[1])

bench::mark(new = ghcnd_splitvars(station),
            old = rnoaa:::ghcnd_splitvars2(station))
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new         58.13ms  60.75ms    14.7      4.64MB     1.84
#> 2 old           1.93s    1.93s     0.519   12.16MB     2.08

Created on 2020-06-04 by the reprex package (v0.3.0)

So, from 2s to less than 60ms!

If you like, I can create a PR with the change (minus ghcnd_splitvars2, of course).

@eliocamp eliocamp changed the title ghcnd_splitvars is slow ghcnd_splitvars can be faster Jun 4, 2020
@sckott sckott added this to the v0.9.7 milestone Jun 8, 2020
@sckott
Copy link
Contributor

sckott commented Jun 8, 2020

Thanks @eliocamp - A speed up would be nice. A PR sounds good.

@eliocamp
Copy link
Contributor Author

eliocamp commented Jun 8, 2020

I opened the PR. I checked that the output is the same as best I could and the only tests that fail are unrelated to the function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants