Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unknown column: precipitation ISD #168

Closed
rjbehnke opened this issue Sep 9, 2016 · 25 comments
Closed

unknown column: precipitation ISD #168

rjbehnke opened this issue Sep 9, 2016 · 25 comments
Milestone

Comments

@rjbehnke
Copy link

rjbehnke commented Sep 9, 2016

Hi,

When I use the rnoaa package to get ISD data, I often get the warning message "unknown column 'precipitation' ". Is there a way to fix this? I am using this package to download the ISD data set for North American stations. I downloaded the isd station history, and am going through each station at a time.

Thank you,
Ruben Behnke

@sckott
Copy link
Contributor

sckott commented Sep 9, 2016

Thanks for your message. Please paste in your sessionInfo() when you have rnoaa loaded

@sckott
Copy link
Contributor

sckott commented Sep 9, 2016

And any example usage of isd() when you get that warning

@rjbehnke
Copy link
Author

rjbehnke commented Sep 9, 2016

Hi,

Here you go. Thank you!

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stringr_1.1.0 dplyr_0.5.0   plyr_1.8.4    rerddap_0.3.4 rnoaa_0.6.0  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7      xml2_1.0.0       magrittr_1.5     rappdirs_0.3.1   munsell_0.4.3    colorspace_1.2-6 R6_2.1.3         httr_1.2.1       tools_3.3.1      grid_3.3.1      
[11] data.table_1.9.6 gtable_0.2.0     DBI_0.5          assertthat_0.1   digest_0.6.10    tibble_1.2       gridExtra_2.2.1  ggplot2_2.1.0    tidyr_0.6.0      curl_1.2        
[21] ncdf4_1.15       mime_0.5         stringi_1.1.1    scales_0.4.0     XML_3.98-1.4     jsonlite_1.0     lubridate_1.5.6  chron_2.3-47    

Example CODE: (Note that the warning messages here come when the download failed, but I have seen it for successful downloads, as well).

[1] 1965
Error : download failed for
   ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1966/690070-93217-1966.gz
In addition: Warning messages:
1: Unknown column 'precipitation' 
2: Unknown column 'precipitation' 
3: Unknown column 'precipitation' 

Ruben

@rjbehnke
Copy link
Author

rjbehnke commented Sep 9, 2016

for (yr in 1901:2016) {
  try(
    assign(paste("data",yr,sep=""), 
           isd(isd_history$USAF[stn], 
               isd_history$WBAN[stn], year = yr, path = "I:\\ISD", 
               overwrite = TRUE,cleanup = TRUE)$data)
  )
  print(yr)
}

@sckott
Copy link
Contributor

sckott commented Sep 10, 2016

thanks, that warning comes from tibble, the output data.frame is special kind of data.frame, of class tbl_df

it's just a warning, but I've just added suppressWarnings to the parsing code so that shouldn't show up anymore. reinstall devtools::install_github("ropensci/rnoaa") and try again

Can you share the the code I need to reproduce the error above

Error : download failed for
   ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1966/690070-93217-1966.gz

that works for me, not sure why it doesn't for you, perhaps a path problem

sckott added a commit that referenced this issue Sep 10, 2016
@rjbehnke
Copy link
Author

Here is the zipped R code I'm using. I get 'download failed' errors a lot. Maybe my code is just bad. I don't know. I'm not a super experienced programmer.

Get_ISD.zip

@rjbehnke
Copy link
Author

I was also wondering if I can just use rnoaa to parse downloaded ISD .gz files. Is there a way to do this? I really appreciate your help.

Ruben

@sckott
Copy link
Contributor

sckott commented Sep 12, 2016

Thanks I'll take a look at your code and get back to you here

I was also wondering if I can just use rnoaa to parse downloaded ISD .gz files. Is there a way to do this? I really appreciate your help.

Not at the moment, but I can expose a function to do that, see #169

@rjbehnke
Copy link
Author

It seems like I can already parse the data just by pointing the path to the directory where the files are located, but a specific function to do this would be great. I am currently downloading all the files.

@sckott sckott added this to the v0.7 milestone Sep 12, 2016
@sckott
Copy link
Contributor

sckott commented Sep 12, 2016

@rjbehnke you use a file isd-history.csv i don't have access to that.

@rjbehnke
Copy link
Author

Here's the isd-history file (contains only North American stations). The isd_read() function works great!

isd-history2.zip

@rjbehnke
Copy link
Author

One other thing I can think of is the option to include/not include bad data in the output. There are a lot of different flags in the ISD data, and missing data is represented by different values for each variable, so I don't know how much automation you want to include in a function. But, for people who just want some nice output, perhaps some automation is ok. I have written QC routines for hourly data from ISD and other networks, but I am refining these routines (they need it before I feel comfortable making them available).

@sckott
Copy link
Contributor

sckott commented Sep 12, 2016

thanks for the file.

There are a lot of different flags in the ISD data, and missing data is represented by different values for each variable, so I don't know how much automation you want to include in a function. But, for people who just want some nice output, perhaps some automation is ok.

Correct. The data is pretty messy. Do you have code already to clean them up?

@rjbehnke
Copy link
Author

Scott,

I do have code, but it is not ready for 'production'. I am starting to refine it (which it desperately needs), but since I am also trying to graduate, it's not a fast process. You are welcome to take a look at it, and work with me in making it better (much better) if you want. Just let me know, and I can send you my code (whether or not it is understandable might be a different story:). There are some major things I want to change.

This code was used for QC of all kinds of sources of data, ranging from ISD to RAWS to many local/regional mesonets. So, it is generalized, and meant for hourly, not daily, data. It is also focused on humidity (specifically, dew point), but it does do general checks on RH and temperature. I would like to write an R package that users who collect their own data or download data from sources that do not do their own QC can use to perform QC. This is a BIG, challenging project, though. I will say that right now, I am likely removing more good data than I care to admit. But, for my work, I'm more concerned about the influence of even a couple bad data values.

Ruben


From: Scott Chamberlain [[email protected]]
Sent: Monday, September 12, 2016 2:51 PM
To: ropensci/rnoaa
Cc: Behnke, Ruben; Mention
Subject: Re: [ropensci/rnoaa] unknown column: precipitation ISD (#168)

thanks for the file.

There are a lot of different flags in the ISD data, and missing data is represented by different values for each variable, so I don't know how much automation you want to include in a function. But, for people who just want some nice output, perhaps some automation is ok.

Correct. The data is pretty messy. Do you have code already to clean them up?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com//issues/168#issuecomment-246488881, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVFU2__vJgO8AK7lzOwqPeMARyC6Dk5Bks5qpbtmgaJpZM4J5fBB.

@sckott
Copy link
Contributor

sckott commented Sep 13, 2016

one thing to note is that I recently 201ad62 changed the output of isd() to a tibble (data.frame) instead of a data.frame nested in a list

sckott added a commit that referenced this issue Sep 13, 2016
@sckott
Copy link
Contributor

sckott commented Sep 13, 2016

try it again after reinstalling devtools::install_github("ropensci/rnoaa")

here's a simpler version of your script, just focusing on making sure the file downloading/etc is working correctly. I think you shouldn't hit download fails anymore, though you might

library(dplyr)
library(rnoaa)

isd_history <- read.csv('~/Downloads/isd-history2.csv')
isd_history$CTRY <- as.character(isd_history$CTRY); isd_history$STATION.NAME <- as.character(isd_history$STATION.NAME)
isd_history <- subset(isd_history, isd_history$CTRY == 'US' | isd_history$CTRY == 'CA' | isd_history$CTRY == 'MX')
isd_history <- subset(isd_history, STATION.NAME != 'MOORED BUOY')

low <- which(isd_history$WBAN < 1000)
med <- which(isd_history$WBAN >= 1000 & isd_history$WBAN <= 9999)
isd_history$WBAN[low] <- paste('00',isd_history$WBAN[low],sep='')
isd_history$WBAN[med] <- paste('0',isd_history$WBAN[med],sep='')
isd_history$ID <- paste(isd_history$USAF,'-',isd_history$WBAN,sep='')

for (stn in 1:10) {
  cat(stn, "\n")
  begin <- as.numeric(substr(isd_history$BEGIN[stn],1,4))
  end <- as.numeric(substr(isd_history$END[stn],1,4))

  for (yr in begin:end) {
    cat("  working on:", yr, "\n")
    res <- tryCatch(
      isd(isd_history$USAF[stn], isd_history$WBAN[stn], year = yr),
      error = function(e) e
    )
    if (inherits(res, "error")) {
      cat("failed on ", isd_history$USAF[stn], isd_history$WBAN[stn], yr, "\n")
    }
  }
}

@sckott
Copy link
Contributor

sckott commented Sep 13, 2016

Went through many the first 6 or so rows of that history file, and it turns out there's some files that just don't exist on NOAA ftp servers , e.g, here's the ones that failed - For each of the stations below, there are some years that worked fine, but others failed, and I looked on the ftp servers and those that failed just didn't have a file. So do use tryCatch() and just skip if the file is not found in your for loop. I'll add something to the docs about files not existing

## 621370-99999

failed on  621370 99999 2006 
failed on  621370 99999 2007 
failed on  621370 99999 2008 
failed on  621370 99999 2009 
failed on  621370 99999 2010 
failed on  621370 99999 2011 
failed on  621370 99999 2012 
failed on  621370 99999 2013 

## 690020-93218

failed on  690020 93218 1972 
failed on  690020 93218 1973 
failed on  690020 93218 1974 
failed on  690020 93218 1975 
failed on  690020 93218 1976 
failed on  690020 93218 1977 
failed on  690020 93218 1978 
failed on  690020 93218 1979 
failed on  690020 93218 1980 
failed on  690020 93218 1981 
failed on  690020 93218 1982 
failed on  690020 93218 1983 
failed on  690020 93218 1984 
failed on  690020 93218 1985 
failed on  690020 93218 1986 
failed on  690020 93218 1987 
failed on  690020 93218 1988 

## 690070-93217

failed on  690070 93217 1971 
failed on  690070 93217 1972 
failed on  690070 93217 1973 
failed on  690070 93217 1974 
failed on  690070 93217 1975 
failed on  690070 93217 1976 
failed on  690070 93217 1977 
failed on  690070 93217 1978 
failed on  690070 93217 1979 
failed on  690070 93217 1980 
failed on  690070 93217 1981 
failed on  690070 93217 1982 
failed on  690070 93217 1983 
failed on  690070 93217 1984 
failed on  690070 93217 1985 
failed on  690070 93217 1986 
failed on  690070 93217 1987 
failed on  690070 93217 1988 
failed on  690070 93217 1989 
failed on  690070 93217 1990 

## 690110-99999

failed on  690110 99999 1947 
failed on  690110 99999 1948 

@rjbehnke
Copy link
Author

Thanks Scott. It just seemed strange that there was so many years missing from the middle of a time series from a station. I guess its just the way ISD is.


From: Scott Chamberlain [[email protected]]
Sent: Tuesday, September 13, 2016 2:08 PM
To: ropensci/rnoaa
Cc: Behnke, Ruben; Mention
Subject: Re: [ropensci/rnoaa] unknown column: precipitation ISD (#168)

Went through many the first 6 or so rows of that history file, and it turns out there's some files that just don't exist on NOAA ftp servers , e.g, here's the ones that failed - For each of the stations below, there are some years that worked fine, but others failed, and I looked on the ftp servers and those that failed just didn't have a file. So do use tryCatch() and just skip if the file is not found in your for loop. I'll add something to the docs about files not existing

621370-99999

failed on 621370 99999 2006
failed on 621370 99999 2007
failed on 621370 99999 2008
failed on 621370 99999 2009
failed on 621370 99999 2010
failed on 621370 99999 2011
failed on 621370 99999 2012
failed on 621370 99999 2013

690020-93218

failed on 690020 93218 1972
failed on 690020 93218 1973
failed on 690020 93218 1974
failed on 690020 93218 1975
failed on 690020 93218 1976
failed on 690020 93218 1977
failed on 690020 93218 1978
failed on 690020 93218 1979
failed on 690020 93218 1980
failed on 690020 93218 1981
failed on 690020 93218 1982
failed on 690020 93218 1983
failed on 690020 93218 1984
failed on 690020 93218 1985
failed on 690020 93218 1986
failed on 690020 93218 1987
failed on 690020 93218 1988

690070-93217

failed on 690070 93217 1971
failed on 690070 93217 1972
failed on 690070 93217 1973
failed on 690070 93217 1974
failed on 690070 93217 1975
failed on 690070 93217 1976
failed on 690070 93217 1977
failed on 690070 93217 1978
failed on 690070 93217 1979
failed on 690070 93217 1980
failed on 690070 93217 1981
failed on 690070 93217 1982
failed on 690070 93217 1983
failed on 690070 93217 1984
failed on 690070 93217 1985
failed on 690070 93217 1986
failed on 690070 93217 1987
failed on 690070 93217 1988
failed on 690070 93217 1989
failed on 690070 93217 1990

690110-99999

failed on 690110 99999 1947
failed on 690110 99999 1948


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com//issues/168#issuecomment-246807434, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVFU28hMKNMokl1oGIWRgnmHm2UkgTEPks5qpwK8gaJpZM4J5fBB.

@sckott
Copy link
Contributor

sckott commented Sep 14, 2016

Right, I guess that's the way it is

@rjbehnke
Copy link
Author

rjbehnke commented Oct 1, 2016

Scott,

The read_isd function works very good, but there are some errors that arise when trying to read the csv files written out after using the isd_read function.  I assume these are probably associated with errors in the NCDC files.  Here is a list of them.  I would suggest that functionality be included with the isd_read function to look for these errors and either correct them or remove the rows they occur on (I have not seen any valid data on rows these errors occur on).
  1. The columns 'total_chars','usaf_station','wban_station", "date", and 'time' occasionally have bad values (or no data whatsoever) that look like "+0230" or "-0700", etc.

  2. Error in read.table(file = file, header = header, sep = sep, quote = quote, : more columns than column names (ex. "697774-99999")

Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed (ex. "467425-99999")

  1. In addition: Warning message:
    In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
    EOF within quoted string

Error in as.POSIXlt.character(x, tz, ...) : character string is not in a standard unambiguous format

Ruben Behnke


From: Behnke, Ruben
Sent: Tuesday, September 13, 2016 5:47 PM
To: ropensci/rnoaa; ropensci/rnoaa
Cc: Mention
Subject: RE: [ropensci/rnoaa] unknown column: precipitation ISD (#168)

Thanks Scott. It just seemed strange that there was so many years missing from the middle of a time series from a station. I guess its just the way ISD is.


From: Scott Chamberlain [[email protected]]
Sent: Tuesday, September 13, 2016 2:08 PM
To: ropensci/rnoaa
Cc: Behnke, Ruben; Mention
Subject: Re: [ropensci/rnoaa] unknown column: precipitation ISD (#168)

Went through many the first 6 or so rows of that history file, and it turns out there's some files that just don't exist on NOAA ftp servers , e.g, here's the ones that failed - For each of the stations below, there are some years that worked fine, but others failed, and I looked on the ftp servers and those that failed just didn't have a file. So do use tryCatch() and just skip if the file is not found in your for loop. I'll add something to the docs about files not existing

621370-99999

failed on 621370 99999 2006
failed on 621370 99999 2007
failed on 621370 99999 2008
failed on 621370 99999 2009
failed on 621370 99999 2010
failed on 621370 99999 2011
failed on 621370 99999 2012
failed on 621370 99999 2013

690020-93218

failed on 690020 93218 1972
failed on 690020 93218 1973
failed on 690020 93218 1974
failed on 690020 93218 1975
failed on 690020 93218 1976
failed on 690020 93218 1977
failed on 690020 93218 1978
failed on 690020 93218 1979
failed on 690020 93218 1980
failed on 690020 93218 1981
failed on 690020 93218 1982
failed on 690020 93218 1983
failed on 690020 93218 1984
failed on 690020 93218 1985
failed on 690020 93218 1986
failed on 690020 93218 1987
failed on 690020 93218 1988

690070-93217

failed on 690070 93217 1971
failed on 690070 93217 1972
failed on 690070 93217 1973
failed on 690070 93217 1974
failed on 690070 93217 1975
failed on 690070 93217 1976
failed on 690070 93217 1977
failed on 690070 93217 1978
failed on 690070 93217 1979
failed on 690070 93217 1980
failed on 690070 93217 1981
failed on 690070 93217 1982
failed on 690070 93217 1983
failed on 690070 93217 1984
failed on 690070 93217 1985
failed on 690070 93217 1986
failed on 690070 93217 1987
failed on 690070 93217 1988
failed on 690070 93217 1989
failed on 690070 93217 1990

690110-99999

failed on 690110 99999 1947
failed on 690110 99999 1948


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com//issues/168#issuecomment-246807434, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVFU28hMKNMokl1oGIWRgnmHm2UkgTEPks5qpwK8gaJpZM4J5fBB.

@sckott
Copy link
Contributor

sckott commented Oct 1, 2016

thanks @rjbehnke for this info. really helpful. It would be even more helpful if you could tell me which dataset requests lead to those errors, so I can quickly get examples that I can play with to sort these errors out.

@rjbehnke
Copy link
Author

rjbehnke commented Oct 1, 2016

Scott,

Here's a document with info on the errors. I attached the script I'm using. Please let me know if you need something else.

Ruben


From: Behnke, Ruben
Sent: Saturday, October 01, 2016 2:04 PM
To: ropensci/rnoaa; ropensci/rnoaa
Cc: Mention
Subject: read_isd errors

Scott,

The read_isd function works very good, but there are some errors that arise when trying to read the csv files written out after using the isd_read function.  I assume these are probably associated with errors in the NCDC files.  Here is a list of them.  I would suggest that functionality be included with the isd_read function to look for these errors and either correct them or remove the rows they occur on (I have not seen any valid data on rows these errors occur on).
  1. The columns 'total_chars','usaf_station','wban_station", "date", and 'time' occasionally have bad values (or no data whatsoever) that look like "+0230" or "-0700", etc.

  2. Error in read.table(file = file, header = header, sep = sep, quote = quote, : more columns than column names (ex. "697774-99999")

Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed (ex. "467425-99999")

  1. In addition: Warning message:
    In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
    EOF within quoted string

Error in as.POSIXlt.character(x, tz, ...) : character string is not in a standard unambiguous format

Ruben Behnke


From: Behnke, Ruben
Sent: Tuesday, September 13, 2016 5:47 PM
To: ropensci/rnoaa; ropensci/rnoaa
Cc: Mention
Subject: RE: [ropensci/rnoaa] unknown column: precipitation ISD (#168)

Thanks Scott. It just seemed strange that there was so many years missing from the middle of a time series from a station. I guess its just the way ISD is.


From: Scott Chamberlain [[email protected]]
Sent: Tuesday, September 13, 2016 2:08 PM
To: ropensci/rnoaa
Cc: Behnke, Ruben; Mention
Subject: Re: [ropensci/rnoaa] unknown column: precipitation ISD (#168)

Went through many the first 6 or so rows of that history file, and it turns out there's some files that just don't exist on NOAA ftp servers , e.g, here's the ones that failed - For each of the stations below, there are some years that worked fine, but others failed, and I looked on the ftp servers and those that failed just didn't have a file. So do use tryCatch() and just skip if the file is not found in your for loop. I'll add something to the docs about files not existing

621370-99999

failed on 621370 99999 2006
failed on 621370 99999 2007
failed on 621370 99999 2008
failed on 621370 99999 2009
failed on 621370 99999 2010
failed on 621370 99999 2011
failed on 621370 99999 2012
failed on 621370 99999 2013

690020-93218

failed on 690020 93218 1972
failed on 690020 93218 1973
failed on 690020 93218 1974
failed on 690020 93218 1975
failed on 690020 93218 1976
failed on 690020 93218 1977
failed on 690020 93218 1978
failed on 690020 93218 1979
failed on 690020 93218 1980
failed on 690020 93218 1981
failed on 690020 93218 1982
failed on 690020 93218 1983
failed on 690020 93218 1984
failed on 690020 93218 1985
failed on 690020 93218 1986
failed on 690020 93218 1987
failed on 690020 93218 1988

690070-93217

failed on 690070 93217 1971
failed on 690070 93217 1972
failed on 690070 93217 1973
failed on 690070 93217 1974
failed on 690070 93217 1975
failed on 690070 93217 1976
failed on 690070 93217 1977
failed on 690070 93217 1978
failed on 690070 93217 1979
failed on 690070 93217 1980
failed on 690070 93217 1981
failed on 690070 93217 1982
failed on 690070 93217 1983
failed on 690070 93217 1984
failed on 690070 93217 1985
failed on 690070 93217 1986
failed on 690070 93217 1987
failed on 690070 93217 1988
failed on 690070 93217 1989
failed on 690070 93217 1990

690110-99999

failed on 690110 99999 1947
failed on 690110 99999 1948


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com//issues/168#issuecomment-246807434, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVFU28hMKNMokl1oGIWRgnmHm2UkgTEPks5qpwK8gaJpZM4J5fBB.

@sckott
Copy link
Contributor

sckott commented Oct 2, 2016

@rjbehnke didn't get the attachment. I think you have to use the github web interface maybe, or email it to me.

@sckott
Copy link
Contributor

sckott commented Oct 6, 2016

see file in #169

@sckott sckott added this to the v0.6.8 milestone Jan 18, 2017
@sckott sckott removed this from the v0.7 milestone Jan 18, 2017
@sckott
Copy link
Contributor

sckott commented Jan 18, 2017

closing for now, let me know if there's anything we didn't sort out @rjbehnke

@sckott sckott closed this as completed Jan 18, 2017
@sckott sckott modified the milestones: v0.6.8, v0.7 Apr 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants