Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue #327, adjust data request date and fix column name #328

Merged
merged 8 commits into from
Nov 6, 2019

Conversation

potterzot
Copy link
Contributor

Description

Two issues arose in the latest CRAN submission:

  1. The tests called GEFS data from the current date based on Sys.time() with a forecast time of '0000'. As a result if the tests were run before '1800', the data file wouldn't be available yet and some tests would fail. This PR fixes that by changing the GEFS request to query a forecast made two days ago, which should always be available regardless of what time the tests are run.
  2. Previous fixes had made large improvements to the indexing of the ens and time dimensions, with the result that the request returned a data.frame with time as a column, whereas previously the specific requested variable was returned (either time1 or time2). The test still tested for variables under those names instead of under the time name. Not sure how this passed the last round of tests but all tests now pass.

Related Issue

Fix #327

@sckott
Copy link
Contributor

sckott commented Oct 30, 2019

thanks @potterzot however, a few things:

  • this does not address the files left on disk, the ones named like occookieQjhsO6 - definitely has to be addressed
  • one of the gefs tests doesn't pass for me, line 155, running gefs_dimension_values(dim = "time2", var = temp), it does not error for me. is it possible operating system dependent?

@potterzot
Copy link
Contributor Author

Thanks @sckott. The file connections were not being closed properly during error checks, leaving behind temporary files. I think all of that is fixed now, and I've added a test to check the list of files in the temporary directory before testing against the list afterward and fail if they are not equal. It works on my linux system, but I'm not sure about other systems. I think it should work since I base the temp directory on the output of tempdir().

For the other test, I am confused about the time, time1, and time2 variables, and I'm wondering if they actually change on the GEFS side of things, but I've reworked the tests to fetch the correct dimension name and test against it with a dimension that is never included. I think it should work on any system.

@sckott
Copy link
Contributor

sckott commented Nov 1, 2019

thanks - i'll have a look

@sckott sckott added this to the v0.9.4 milestone Nov 1, 2019
@sckott
Copy link
Contributor

sckott commented Nov 1, 2019

thanks much @potterzot

Files are now cleaned up on function exit, thats great, thanks. however, still seems like CRAN team may not be happy where files are being written. they say

Packages should not write in the user’s home filespace (including clipboards), nor anywhere else on the file system apart from the R session’s temporary directory (or during installation in the location pointed to by TMPDIR

on my macos, files are being written to /private/tmp/. my tempdir() gives /var/folders/fc/n7g_vrvn0sx_st0p8lxb3ts40000gn/T//RtmpvxX8rj, so at least on macos gefs isn't using the appropriate tempdir. I assume ncdf4 controls where the files are going? is there any way to change the base directory? if you can control the path where files are written, you could then expose that as a parameter - and then users can use whatever path they want, persistent or not, and you can in examples and tests use a temp directory that cran will be happy with

@potterzot
Copy link
Contributor Author

Hmm, I'm not sure what is going on here. ncdf4 is itself on CRAN, so it seems weird that it would do this. But it was updated on 10-23-2019, so maybe it just hasn't been caught yet? And previously the gefs functions in rnoaa didn't trigger this problem. I'll contact the package maintainer. The function to generate the connection (nc_open()) doesn't have an option to specify the temporary directory unfortunately.

@sckott
Copy link
Contributor

sckott commented Nov 2, 2019

Okay, yeah they're not consistent probably in checking.

We're probably okay as long as the files are not left anywhere on disk. if you don't get a response by monday or tuesday we should move on as we need to get it in by the end of next week

@potterzot
Copy link
Contributor Author

I received an email from the ncdf4 package author](http://cirrus.ucsd.edu/~pierce/ncdf/). That package does not specify any creation of a temporary file and does not provide a flag to pass on to the underlying netcdf library. Moreover, the netcdf header file "netcdf.h" doesn't seem to allow that as a parameter to it's C function nc_open(). I sent an email to the netcdf email list at unidata, and will see what I get back from it. In the meantime though, it seems there's no way for us to prevent the creation of the temp file in the system temporary directory.

Hopefully this is acceptable by CRAN, since it's not an R package that is doing the file creation and if we're careful to close file connections so that the temporary files get deleted afterward. If not, we may have to remove gefs.R for the time being...

@potterzot
Copy link
Contributor Author

Just to follow up for future/posterity, I received this email regarding how to specify the tempfile location:

That is correct. You need to create a .dodsrc file
and insert HTTP.COOKIEJAR= into it.
=Dennis Heimbigner
Unidata

Since we'd have to create the .dodsrc file in the users directory, I think we're faced with either creating a non-standard config file to tell netcdf where to store the temporary file, or accepting the standard location of the temporary file based on system temporary files. Neither is great but both cases create files outside of the standard R tempdir(). I don't see a way around this at this point. If you think it's worth doing, we could note in the documentation that this happens and that the user can create their own .dodsrc file to control the location of the temporary file, something like:

library(rnoaa)

# Create the config file to tell netcdf where to store temporary files
cat(paste0("HTTP.COOKIEJAR=", tempdir()), file="~/.dodsrc")

# Make data request
d <- gefs(...)

# Remove the config file if desired
file.remove("~/.dodsrc")

@sckott
Copy link
Contributor

sckott commented Nov 6, 2019

thanks @potterzot for looking into the netcdf file location details!

Let's think about in future versions if we can do something about the temp files, but for now let's merge this, and submit to cran, and if they're still not happy then we can explore letting users change the location of the ncdf temp files, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

two gefs issues: broken test and leaving files in users /tmp directory
2 participants