Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add to man page for coops_search #213

Closed
tphilippi opened this issue Apr 20, 2017 · 6 comments
Closed

add to man page for coops_search #213

tphilippi opened this issue Apr 20, 2017 · 6 comments
Milestone

Comments

@tphilippi
Copy link
Contributor

Could you add to the coops.html page a note that for most if not all data products, the NOAA coops API only vends up to 31 days of data at a time?

test1 <- coops_search(begin_date=20100101, 
                      end_date=20100131, 
                      station_name=9410230,
                      product="water_temperature", 
                      datum = NULL, 
                      units = "metric", 
                      time_zone = "gmt",
                      application = "NPS-rnoaa")

works, but:

test2 <- coops_search(begin_date=20100101, 
                      end_date=20101231, 
                      station_name=9410230,
                      product="water_temperature", 
                      datum = NULL, 
                      units = "metric", 
                      time_zone = "gmt",
                      application = "NPS-rnoaa")  

fails with a misleading (but correct) error message

Error: No data was found

Also, you might add a trap for this circa line 138 (not sure if this is in your exact style):

bd <- as.Date(as.character(begin_date),format="%Y%m%d")
ed <- as.Date(as.character(end_date),format="%Y%m%d")
if ((ed - bd) > 31) {
  stop(paste("Duration must be 31 days or less\nbegin_date=",begin_date,
             " end_date=",end_date," duration=",(ed-bd),"\n",sep=""), call. = FALSE)
}

At least for now I suggest adding a note to the documentation and possibly the error trapping. Adding the ability to have end_date - start_date >31 days is not simple. Generating a list of 31-day requests from start_date and end_date is easy; the returned data for each chunk can be clean or completely missing or at least 1 other pathology. If I get to the point where I'm confident I understand all of the possibilities, I may make a pull request and submit proposed code.

@sckott
Copy link
Contributor

sckott commented Apr 20, 2017

what is the coops.html page ?

yes, can definitely document this. do you know where the NOAA docs state this?

that block of code looks pretty good. pull request?

@tphilippi
Copy link
Contributor Author

coops.html is the man page generated from rnoaa/man/coops.Rd
I used that name because the man page for coops_search is not coops_search.html but rather coops.html.

The NOAA API docs mentions this in a table near the bottom ( (https://tidesandcurrents.noaa.gov/api/#maximum )
(that looks to be the same as the page you link: http://co-ops.nos.noaa.gov/api/#maximum )

If you submit a url with a long duration, datagetter returns an error message:

Range Limit Exceeded: The size limit for data retrieval for this product is 31 days

For example:
https://tidesandcurrents.noaa.gov/api/datagetter?product=water_temperature&application=NPS&begin_date=20170101&end_date=20170330&station=9410230&time_zone=GMT&units=metric&interval=6&format=csv
This has been true for predicted & observed tidal heights for the last 2 versions of their API, now I get it for temperature data, too.

Note that the limit differs by product, but interval=h in the API has the potential to change it for many of the products. If your ... only passes parameters to curl (and not interval= to the API), the duration limit is 1 year for hourly_height and high_low, and 10 years for daily_mean and monthly_mean, and I don't think is applicable to datums.

If I have time this weekend I'll fork the repository & extend my trapping code fragment to test durations against the product-specific maxima. If I can't this weekend, I should be able to get at it the following weekend. I won't feel insulted if you do the patch instead of waiting for me, but I also don't think it is an emergency, especially if you can add a brief note to coops.Rd in the meantime.

In the broader picture, I haven't decided whether it would be better to put my code for splitting long requests into 31 day chunks inside your functions, or leaving it as an external wrapper function that chunks the dates, calls your functions, then assembles the results. In terms of (you) maintaining your codebase in the face of changing APIs, I suspect an external wrapper might be better. What are your thoughts? Would you be interested in a function by_chunk(begin_date, end_date, chunk_size, rnoaa_fun, ...) contributed to your package, or should I just put it in my own repository of utility functions?

@sckott sckott modified the milestones: v0.6.8, v0.7 Apr 20, 2017
@sckott
Copy link
Contributor

sckott commented Apr 20, 2017

ah, okay, the man page for coops.

thanks for the link

If I have time this weekend ...

I'll add the note to man page, and let you do the code change - next milestone with issues is here https://github.com/ropensci/rnoaa/milestone/10 - want to submit to CRAN relatively soon, within a week or so, but deadline to go later

In the broader picture ...

I'd vote for putting it in here. If you want that feature i'm guessing others do as well. agree it's best not to break package APIs, but we could add an additional function or parameter inside the current function. i think probably a second fxn is best so we don't break behavior of the current fxn

@tphilippi
Copy link
Contributor Author

Scott--
My question is whether my "chunk_it" code should go inside coops_search() and other functions, or be a single stand-alone wrapper function by_chunk(begin_date, end_date, chunk_size, rnoaa_fun, ...) that could work with more than just coops_search()? Inside each function has the advantage of transparency to the users: it just works no matter the duration requested. It has the disadvantage of potentially needing to be tweaked in multiple places if and when the NOAA APIs change their limits. I haven't looked at your code in taxize & elsewhere that chunks long requests, but its roughly the same idea (just more complex because durations are not as simple as blocks of rows).
In terms of something useful that minimizes the potential for adversely affecting your code, if you don't tell me different, I'll probably write the wrapper function by_chunk().

@sckott
Copy link
Contributor

sckott commented Apr 21, 2017

I agree best as a separate function.

@sckott
Copy link
Contributor

sckott commented Apr 27, 2017

i think your PR sorted this out, clsoing, reopen if not

@sckott sckott closed this as completed Apr 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants