Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ghcnd_search() stopped pulling data after June 2018 #269

Closed
kgmccann opened this issue Aug 13, 2018 · 13 comments
Closed

ghcnd_search() stopped pulling data after June 2018 #269

kgmccann opened this issue Aug 13, 2018 · 13 comments
Milestone

Comments

@kgmccann
Copy link

Not sure if this is isolated to the county I specifically need but following code stops returning rows after 6/30/2018 (not even dates with NAs). I checked here just to be sure that the station was still updating and it is, but even if it was offline, I would expect the function to return up to my date_max. Either way, this code is not behaving the way it has in the past.

library(rnoaa)
library(dplyr)
broward1 <-ghcnd_search(stationid = "US1FLBW0007",date_min = "2017-06-01",date_max = "2018-08-10",var = "prcp")
broward1$prcp %>% select(date,prcp) %>% arrange(desc(date))
System Info

setting value
version R version 3.5.0 (2018-04-23)
system x86_64, mingw32
ui RStudio (1.1.453)
language (EN)
collate English_United States.1252
tz America/New_York
date 2018-08-13

@sckott
Copy link
Contributor

sckott commented Aug 13, 2018

thanks @kgmccann ! i'll have a look

@sckott
Copy link
Contributor

sckott commented Aug 13, 2018

p.s. when you share session info can you share the output of sessionInfo() after rnoaa is loaded so i can see what version of rnoaa is installed and what version of its dependencies

@sckott
Copy link
Contributor

sckott commented Aug 13, 2018

is this what you expect:

broward1$prcp %>% select(date,prcp) %>% arrange(desc(date))
#> # A tibble: 436 x 2
#>    date        prcp
#>    <date>     <int>
#>  1 2018-08-10    53
#>  2 2018-08-09     0
#>  3 2018-08-08     0
#>  4 2018-08-07    NA
#>  5 2018-08-06    NA
#>  6 2018-08-05    NA
#>  7 2018-08-04    NA
#>  8 2018-08-03    NA
#>  9 2018-08-02    NA
#> 10 2018-08-01     0

does seem like data is returned up to your max date.

@kgmccann
Copy link
Author

kgmccann commented Aug 14, 2018

Thanks, that is what i was looking for. maybe it's my version?

R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2.2 dplyr_0.7.5    rnoaa_0.7.0   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17     xml2_1.2.0       bindr_0.1.1      magrittr_1.5     rappdirs_0.3.1   tidyselect_0.2.4 munsell_0.4.3   
 [8] colorspace_1.3-2 R6_2.2.2         rlang_0.2.0      hoardr_0.2.0     stringr_1.3.1    httr_1.3.1       plyr_1.8.4      
[15] dplyr_0.7.5      tools_3.5.0      grid_3.5.0       gtable_0.2.0     digest_0.6.15    yaml_2.1.19      lazyeval_0.2.1  
[22] assertthat_0.2.0 tibble_1.4.2     bindrcpp_0.2.2   gridExtra_2.3    tidyr_0.8.1      purrr_0.2.4      ggplot2_3.0.0   
[29] glue_1.2.0       stringi_1.1.7    compiler_3.5.0   pillar_1.2.3     scales_0.5.0     XML_3.98-1.11    lubridate_1.7.4 
[36] jsonlite_1.5     pkgconfig_2.0.1 

@sckott
Copy link
Contributor

sckott commented Aug 14, 2018

Can you try installing from github remotes::install_github("ropensci/rnoaa") and try again?

@kgmccann
Copy link
Author

Still stopping at 06-30.
The only thing I am not showing is how I am setting my token. it's like this:

api_token <- 'TOKENtokenToKENtokenToken'
options("noaakey" = api_token)

And the rest is:

>library(rnoaa)
>library(dplyr)
> broward1 <-ghcnd_search(stationid = "US1FLBW0007",date_min = "2017-06-01",date_max = "2018-08-10",var = "prcp")
> broward1$prcp %>% select(date,prcp) %>% arrange(desc(date))
# A tibble: 395 x 2
   date        prcp
   <date>     <int>
 1 2018-06-30    NA
 2 2018-06-29    NA
 3 2018-06-28    NA
 4 2018-06-27    NA
 5 2018-06-26    10
 6 2018-06-25     0
 7 2018-06-24    33
 8 2018-06-23    10
 9 2018-06-22   206
10 2018-06-21     0
# ... with 385 more rows
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2.2   dplyr_0.7.6      rnoaa_0.7.1.9326

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18     pillar_1.3.0     compiler_3.5.0   plyr_1.8.4       bindr_0.1.1      remotes_1.1.1   
 [7] tools_3.5.0      digest_0.6.15    jsonlite_1.5     lubridate_1.7.4  tibble_1.4.2     gtable_0.2.0    
[13] lattice_0.20-35  pkgconfig_2.0.1  rlang_0.2.1      Matrix_1.2-14    cli_1.0.0        rstudioapi_0.7  
[19] yaml_2.1.19      gridExtra_2.3    xml2_1.2.0       httr_1.3.1       stringr_1.3.1    rappdirs_0.3.1  
[25] grid_3.5.0       tidyselect_0.2.4 glue_1.3.0       R6_2.2.2         fansi_0.2.3      XML_3.98-1.15   
[31] hoardr_0.2.0     tidyr_0.8.1      ggplot2_3.0.0    purrr_0.2.5      magrittr_1.5     scales_1.0.0    
[37] assertthat_0.2.0 colorspace_1.3-2 utf8_1.1.4       stringi_1.1.7    lazyeval_0.2.1   munsell_0.5.0   
[43] crayon_1.3.4    
> 

@sckott
Copy link
Contributor

sckott commented Aug 14, 2018

can you run ghcnd_clear_cache() and then try your code again, let me know what happens.

@kgmccann
Copy link
Author

It worked! thanks so much.

I have a follow-up question:
I am planning on setting this up to run as a automated procedure to periodically update a weather table. Would you recommend using the clear cache function every time?
Thanks!

@sckott
Copy link
Contributor

sckott commented Aug 14, 2018

At this point yes.

But i will see if i can cache the files including the date so that you then shouldn't have to clear the cache.

@sckott sckott added this to the v0.8 milestone Aug 14, 2018
@sckott
Copy link
Contributor

sckott commented Aug 15, 2018

So I think this was the problem:

You requested data for station US1FLBW0007 at some date X and it was cached on your machine. Then you did a subsequent request (your code at the top of this issue) and it used the cache file, and it had been long enough that the date range you requested wasn't in the cached file (at least some of the dates that is).

The issue is that the same file is downloaded for a station (e.g., US1FLBW0007) whether you restrtict to dates or not. There's just no way to download the GHCND data form ftp server by date, you get the whole thing or not at all 😄

So if we cache files for every combination of stationid, min date and max date, that could lead to lots of files cached on your machine that are really all the same thing. SO unnecessarily taking up disk space.

So here's the approach we'll take: for ghcnd() collect file path and last modified date and print to console so users know where the file is and when it was last updated. For ghcnd_search() (which wraps ghcnd, so will print file path and last modified), also print min and max dates in the file. Also added a refresh parameter so you can force a re download of the file based on that info. You can also programatically get the file path and last modified date like:

x <- ghcnd("US1FLBW0007")
attr(x, "source")
attr(x, "file_modified")

reinstall with remotes::install_github("ropensci/rnoaa")

interested to hear your thoughts.

sckott added a commit that referenced this issue Aug 15, 2018
add refresh parameter to force a download of the file
add messages to print to let user know file path and last modiifed date for ghcnd
and message of min and max dates for ghcnd_search
update docs for both
@scoyoc
Copy link

scoyoc commented Aug 21, 2018

Thanks for this fix. I have been struggling with this same issue for a couple months now. My work computer would only download data through the 1st of the year, but I could download all the data through the current date on my machine at home. Now that I've read through this, it makes sense that clearing the cache fixes the issue.

Thanks again!
MVS

@sckott
Copy link
Contributor

sckott commented Aug 21, 2018

@scoyoc great, glad this helps

@sckott
Copy link
Contributor

sckott commented Oct 25, 2018

AFAICT seems sorted

@sckott sckott closed this as completed Oct 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants