Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple stationids in ncdc_stations #138

Closed
tchakravarty opened this issue Feb 12, 2016 · 3 comments
Closed

Multiple stationids in ncdc_stations #138

tchakravarty opened this issue Feb 12, 2016 · 3 comments
Assignees
Milestone

Comments

@tchakravarty
Copy link

I am unable to figure out how to specify a list of weather stations in the ncdc_stations function. The function has a relevant sounding argument stationid, but it asks to pass a chain of of [sic] station ids in a comma-separated vector. I don't understand what data structure is expected here.

Here is what I have in my code:

# get country data
df_ghcnd_countries = 
  read_table("http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-countries.txt", 
             col_names = c("Abbreviation", "Country"))

# get station data
df_ghcnd_stations = 
  read_fwf("http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt", 
           col_positions = fwf_positions(
             start = c(1, 13, 22, 32, 39, 42, 73, 77, 81),
             end = c(11, 20, 30, 37, 40, 71, 75, 79, 85),
             col_names = c("ID", 
                           "Latitude",
                           "Longitude", 
                           "Elevation",
                           "State",
                           "Name",
                           "GSN Flag",
                           "HCN/CRN Flag",
                           "WMO ID")
           )
  )

# get the list of stations for Germany; using a positive lookahead regex
german_stations = df_ghcnd_stations$ID[grepl(paste0("^", df_ghcnd_countries$Abbreviation[
  df_ghcnd_countries$Country == "Germany"
], "(?=[0-9])"), df_ghcnd_stations$ID, perl = TRUE)]

# get the weather station data
station_list = ncdc_stations(
  startdate = "2014-01-01",
  enddate = "2015-11-01",
  stationid = paste0(paste0("GHCND:", german_stations), collapse = ";")
) 

This returns with just data for one (the first) station.

I have also tried paste0(paste0("GHCND:", german_stations), collapse = ",") and debugging the function, but since I do not know the URL formation requirements for NOAA, I am unable to proceed further. Any help here would be appreciated.

Thanks.

@sckott
Copy link
Contributor

sckott commented Feb 12, 2016

Thanks for your question @tchakravarty I'll have a look soon

@sckott sckott self-assigned this Feb 12, 2016
@sckott
Copy link
Contributor

sckott commented Feb 12, 2016

  • We do have a function in rnoaa that does what your df_ghcnd_countries variable gets, see ghcnd_countries()
  • We do have a function in rnoaa that does what your df_ghcnd_stations variable gets, see ghcnd_stations()
  • I mistakingly allowed more than one station to be passed to ncdc_stations() - will push a change soon to fix that, it now stops with error if stationid length is > 1 - You can instead use e.g., lapply() or a for loop to deal with many station ids

Try this at the end of your script above instead of the call to ncdc_stations()

dplyr::bind_rows(lapply(paste0("GHCND:", german_stations), function(z) {
  ncdc_stations(
    startdate = "2014-01-01",
    enddate = "2015-11-01",
    stationid = z)
}$data))
#> Source: local data frame [22 x 9]
#> 
#>    elevation    mindate    maxdate latitude                  name datacoverage                id
#>        (int)      (chr)      (chr)    (dbl)                 (chr)        (dbl)             (chr)
#> 1         62 1891-01-01 1991-12-31  51.9506          MUENSTER, GM       0.9802 GHCND:GM000001153
#> 2          4 1890-01-01 2016-02-09  53.0464            BREMEN, GM       0.9928 GHCND:GM000001474
#> 3        144 1907-05-01 2003-05-19  49.7517             TRIER, GM       0.8674 GHCND:GM000002277
#> 4        285 1901-01-01 2015-11-30  49.4253    KAISERSLAUTERN, GM       0.8557 GHCND:GM000002288
#> 5        112 1876-01-01 2008-10-31  49.0392         KARLSRUHE, GM       0.9931 GHCND:GM000002698
#> 6        401 1900-01-01 2016-02-09  48.7703         STUTTGART, GM       0.9663 GHCND:GM000002716
#> 7         59 1890-01-01 2015-11-30  53.6442          SCHWERIN, GM       0.9815 GHCND:GM000003038
#> 8         93 1900-01-01 2015-03-22  51.5144             HALLE, GM       0.9928 GHCND:GM000003218
#> 9        246 1917-01-01 1999-12-31  51.1167 DRESDEN WAHNSDORF, GM       1.0000 GHCND:GM000003244
#> 10        51 1876-01-01 2014-06-02  52.4639     BERLIN DAHLEM, GM       0.9748 GHCND:GM000003319
#> ..       ...        ...        ...      ...                   ...          ...               ...
#> Variables not shown: elevationUnit (chr), longitude (dbl)

Note that there is a problem however with some other ncdc_*() functions (ncdc_stations only accepts one at a time) in that they are supposed to take in many stationid parameters, but they don't. I've contacted NOAA to see if they can fix.

sckott added a commit that referenced this issue Feb 12, 2016
also fixes to ncdc_* fxns that allow more than one stationid to work
thought they dont work right now as ncdc API not working for > 1 stationid param, emailed them
bumped dev version
tidied some code too while in there
@sckott sckott added this to the v0.6 milestone Feb 12, 2016
@sckott
Copy link
Contributor

sckott commented Apr 11, 2016

closing, see #150 for following up with NOAA about the multiple stationids issue

@sckott sckott closed this as completed Apr 11, 2016
@sckott sckott modified the milestones: v0.5.6, v0.6 Apr 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants