Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_lines deals with Carriage Return (?) different #1210

Closed
ldecicco-USGS opened this issue May 13, 2021 · 2 comments
Closed

read_lines deals with Carriage Return (?) different #1210

ldecicco-USGS opened this issue May 13, 2021 · 2 comments

Comments

@ldecicco-USGS
Copy link

I'm seeing a difference in how readr 2.0 parses lines compared to 1.4:

obs_url <- "https://nwis.waterdata.usgs.gov/nwis/qwdata?multiple_site_no=04024430,04024000&multiple_parameter_cds=34247,30234,32104,34220&param_cd_operator=OR&list_of_search_criteria=multiple_site_no,multiple_parameter_cds&group_key=NONE&sitefile_output_format=html_table&column_name=agency_cd&column_name=site_no&column_name=station_nm&inventory_output=0&rdb_inventory_output=file&TZoutput=0&pm_cd_compare=Greater%20than&radio_parm_cds=previous_parm_cds&qw_attributes=0&format=rdb&rdb_qw_attributes=expanded&date_format=YYYY-MM-DD&rdb_compression=value&qw_sample_wide=0&begin_date=2010-11-03"

lines <- readLines(obs_url)
base_meta <- lines[grep("\\#", lines)]
length(base_meta)
[1] 123

packageVersion("readr")
[1] ‘1.4.0.9000lines2 <- readr::read_lines(obs_url)
meta_lines <- lines2[grep("\\#", lines2)]
length(meta_lines)
[1] 117

packageVersion("readr")
[1] ‘1.4.0lines_OG <- readr::read_lines(obs_url)
meta_lines <- lines_OG[grep("\\#", lines_OG)]
length(meta_lines)
[1] 123

Raw text:

# M  - presence verified but not quantified\r\n# Description of val_qual_tx:\n# b  - value extrapolated at low end\n# c  - see result comment\n# n  - below the reporting level but at or above the detection level\n# t  - below the detection level\n#\r\n

So, the lines that are messed up have an end-of-line as \n, whereas the rest of the end-of-lines (that readr picks up correctly) are \r\n.

@jimhester
Copy link
Collaborator

Yeah having a mix of newlines is definitely the issue. I'll see if there is something we can do to work around files like this.

@jimhester
Copy link
Collaborator

This should be fixed by tidyverse/vroom@2b94f88

library("readr")

obs_url <- "https://nwis.waterdata.usgs.gov/nwis/qwdata?multiple_site_no=04024430,04024000&multiple_parameter_cds=34247,30234,32104,34220&param_cd_operator=OR&list_of_search_criteria=multiple_site_no,multiple_parameter_cds&group_key=NONE&sitefile_output_format=html_table&column_name=agency_cd&column_name=site_no&column_name=station_nm&inventory_output=0&rdb_inventory_output=file&TZoutput=0&pm_cd_compare=Greater%20than&radio_parm_cds=previous_parm_cds&qw_attributes=0&format=rdb&rdb_qw_attributes=expanded&date_format=YYYY-MM-DD&rdb_compression=value&qw_sample_wide=0&begin_date=2010-11-03"
lines2 <- readr::read_lines(obs_url)
meta_lines <- lines2[grep("\\#", lines2)]
length(meta_lines)
#> [1] 123

Created on 2021-05-17 by the reprex package (v2.0.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants