-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unknown column: precipitation ISD #168
Comments
Thanks for your message. Please paste in your |
And any example usage of |
Hi, Here you go. Thank you!
Example CODE: (Note that the warning messages here come when the download failed, but I have seen it for successful downloads, as well).
Ruben |
for (yr in 1901:2016) {
try(
assign(paste("data",yr,sep=""),
isd(isd_history$USAF[stn],
isd_history$WBAN[stn], year = yr, path = "I:\\ISD",
overwrite = TRUE,cleanup = TRUE)$data)
)
print(yr)
} |
thanks, that warning comes from it's just a warning, but I've just added Can you share the the code I need to reproduce the error above Error : download failed for
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1966/690070-93217-1966.gz that works for me, not sure why it doesn't for you, perhaps a path problem |
Here is the zipped R code I'm using. I get 'download failed' errors a lot. Maybe my code is just bad. I don't know. I'm not a super experienced programmer. |
I was also wondering if I can just use rnoaa to parse downloaded ISD .gz files. Is there a way to do this? I really appreciate your help. Ruben |
Thanks I'll take a look at your code and get back to you here
Not at the moment, but I can expose a function to do that, see #169 |
It seems like I can already parse the data just by pointing the path to the directory where the files are located, but a specific function to do this would be great. I am currently downloading all the files. |
@rjbehnke you use a file |
Here's the isd-history file (contains only North American stations). The isd_read() function works great! |
One other thing I can think of is the option to include/not include bad data in the output. There are a lot of different flags in the ISD data, and missing data is represented by different values for each variable, so I don't know how much automation you want to include in a function. But, for people who just want some nice output, perhaps some automation is ok. I have written QC routines for hourly data from ISD and other networks, but I am refining these routines (they need it before I feel comfortable making them available). |
thanks for the file.
Correct. The data is pretty messy. Do you have code already to clean them up? |
Scott, I do have code, but it is not ready for 'production'. I am starting to refine it (which it desperately needs), but since I am also trying to graduate, it's not a fast process. You are welcome to take a look at it, and work with me in making it better (much better) if you want. Just let me know, and I can send you my code (whether or not it is understandable might be a different story:). There are some major things I want to change. This code was used for QC of all kinds of sources of data, ranging from ISD to RAWS to many local/regional mesonets. So, it is generalized, and meant for hourly, not daily, data. It is also focused on humidity (specifically, dew point), but it does do general checks on RH and temperature. I would like to write an R package that users who collect their own data or download data from sources that do not do their own QC can use to perform QC. This is a BIG, challenging project, though. I will say that right now, I am likely removing more good data than I care to admit. But, for my work, I'm more concerned about the influence of even a couple bad data values. Ruben From: Scott Chamberlain [[email protected]] thanks for the file. There are a lot of different flags in the ISD data, and missing data is represented by different values for each variable, so I don't know how much automation you want to include in a function. But, for people who just want some nice output, perhaps some automation is ok. Correct. The data is pretty messy. Do you have code already to clean them up? � |
one thing to note is that I recently 201ad62 changed the output of |
and now works bumped dev version, #168
try it again after reinstalling here's a simpler version of your script, just focusing on making sure the file downloading/etc is working correctly. I think you shouldn't hit download fails anymore, though you might library(dplyr)
library(rnoaa)
isd_history <- read.csv('~/Downloads/isd-history2.csv')
isd_history$CTRY <- as.character(isd_history$CTRY); isd_history$STATION.NAME <- as.character(isd_history$STATION.NAME)
isd_history <- subset(isd_history, isd_history$CTRY == 'US' | isd_history$CTRY == 'CA' | isd_history$CTRY == 'MX')
isd_history <- subset(isd_history, STATION.NAME != 'MOORED BUOY')
low <- which(isd_history$WBAN < 1000)
med <- which(isd_history$WBAN >= 1000 & isd_history$WBAN <= 9999)
isd_history$WBAN[low] <- paste('00',isd_history$WBAN[low],sep='')
isd_history$WBAN[med] <- paste('0',isd_history$WBAN[med],sep='')
isd_history$ID <- paste(isd_history$USAF,'-',isd_history$WBAN,sep='')
for (stn in 1:10) {
cat(stn, "\n")
begin <- as.numeric(substr(isd_history$BEGIN[stn],1,4))
end <- as.numeric(substr(isd_history$END[stn],1,4))
for (yr in begin:end) {
cat(" working on:", yr, "\n")
res <- tryCatch(
isd(isd_history$USAF[stn], isd_history$WBAN[stn], year = yr),
error = function(e) e
)
if (inherits(res, "error")) {
cat("failed on ", isd_history$USAF[stn], isd_history$WBAN[stn], yr, "\n")
}
}
} |
Went through many the first 6 or so rows of that history file, and it turns out there's some files that just don't exist on NOAA ftp servers , e.g, here's the ones that failed - For each of the stations below, there are some years that worked fine, but others failed, and I looked on the ftp servers and those that failed just didn't have a file. So do use
|
Thanks Scott. It just seemed strange that there was so many years missing from the middle of a time series from a station. I guess its just the way ISD is. From: Scott Chamberlain [[email protected]] Went through many the first 6 or so rows of that history file, and it turns out there's some files that just don't exist on NOAA ftp servers , e.g, here's the ones that failed - For each of the stations below, there are some years that worked fine, but others failed, and I looked on the ftp servers and those that failed just didn't have a file. So do use tryCatch() and just skip if the file is not found in your for loop. I'll add something to the docs about files not existing 621370-99999failed on 621370 99999 2006 690020-93218failed on 690020 93218 1972 690070-93217failed on 690070 93217 1971 690110-99999failed on 690110 99999 1947 � |
Right, I guess that's the way it is |
Scott,
Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed (ex. "467425-99999")
Error in as.POSIXlt.character(x, tz, ...) : character string is not in a standard unambiguous format Ruben Behnke From: Behnke, Ruben Thanks Scott. It just seemed strange that there was so many years missing from the middle of a time series from a station. I guess its just the way ISD is. From: Scott Chamberlain [[email protected]] Went through many the first 6 or so rows of that history file, and it turns out there's some files that just don't exist on NOAA ftp servers , e.g, here's the ones that failed - For each of the stations below, there are some years that worked fine, but others failed, and I looked on the ftp servers and those that failed just didn't have a file. So do use tryCatch() and just skip if the file is not found in your for loop. I'll add something to the docs about files not existing 621370-99999failed on 621370 99999 2006 690020-93218failed on 690020 93218 1972 690070-93217failed on 690070 93217 1971 690110-99999failed on 690110 99999 1947 � |
thanks @rjbehnke for this info. really helpful. It would be even more helpful if you could tell me which dataset requests lead to those errors, so I can quickly get examples that I can play with to sort these errors out. |
Scott, Here's a document with info on the errors. I attached the script I'm using. Please let me know if you need something else. Ruben From: Behnke, Ruben Scott,
Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed (ex. "467425-99999")
Error in as.POSIXlt.character(x, tz, ...) : character string is not in a standard unambiguous format Ruben Behnke From: Behnke, Ruben Thanks Scott. It just seemed strange that there was so many years missing from the middle of a time series from a station. I guess its just the way ISD is. From: Scott Chamberlain [[email protected]] Went through many the first 6 or so rows of that history file, and it turns out there's some files that just don't exist on NOAA ftp servers , e.g, here's the ones that failed - For each of the stations below, there are some years that worked fine, but others failed, and I looked on the ftp servers and those that failed just didn't have a file. So do use tryCatch() and just skip if the file is not found in your for loop. I'll add something to the docs about files not existing 621370-99999failed on 621370 99999 2006 690020-93218failed on 690020 93218 1972 690070-93217failed on 690070 93217 1971 690110-99999failed on 690110 99999 1947 — |
@rjbehnke didn't get the attachment. I think you have to use the github web interface maybe, or email it to me. |
see file in #169 |
closing for now, let me know if there's anything we didn't sort out @rjbehnke |
Hi,
When I use the rnoaa package to get ISD data, I often get the warning message "unknown column 'precipitation' ". Is there a way to fix this? I am using this package to download the ISD data set for North American stations. I downloaded the isd station history, and am going through each station at a time.
Thank you,
Ruben Behnke
The text was updated successfully, but these errors were encountered: