when specifying n_max argument, the last line of the CSV file is not imported. #1321

hidekoji · 2021-11-01T21:37:11Z

readr: version 2.0.2

I have a CSV file (https://www.dropbox.com/s/uk79hnrqs9rm9hq/test1.csv?dl=1) that looks like this.

user_number, user_extract_way
1935, 2021/04/01 - 2021/04/05
user, service_id
aaaa, bbbb

if I set n_max parameter as 5 when using reader::read_csv, it does not import the last row and only shows 2 lines.

readr::read_delim("https://www.dropbox.com/s/uk79hnrqs9rm9hq/test1.csv?dl=1", delim = ",", quote = "\"", col_names = T, na = c('','NA'), n_max = 5)
#> Rows: 2 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): user_number,  user_extract_way
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 2 × 2
#>   user_number ` user_extract_way`       
#>   <chr>       <chr>                     
#> 1 1935        " 2021/04/01 - 2021/04/05"
#> 2 user        " service_id"

^{Created on 2021-11-01 by the reprex package (v2.0.1)}

It imports all if I don't specify the n_max argument. (see the below reprex that shows 3rd line as expected)

readr::read_delim("https://www.dropbox.com/s/uk79hnrqs9rm9hq/test1.csv?dl=1", delim = ",", quote = "\"", col_names = T, na = c('','NA'))
#> Rows: 3 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): user_number,  user_extract_way
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 3 × 2
#>   user_number ` user_extract_way`       
#>   <chr>       <chr>                     
#> 1 1935        " 2021/04/01 - 2021/04/05"
#> 2 user        " service_id"             
#> 3 aaaa        " bbbb"

^{Created on 2021-11-01 by the reprex package (v2.0.1)}

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.28   withr_2.4.2     magrittr_2.0.1  reprex_2.0.1   
#>  [5] evaluate_0.14   highr_0.9       stringi_1.7.5   rlang_0.4.12   
#>  [9] cli_3.1.0       rstudioapi_0.13 fs_1.5.0        rmarkdown_2.11 
#> [13] tools_4.1.0     stringr_1.4.0   glue_1.4.2      xfun_0.27      
#> [17] yaml_2.2.1      fastmap_1.1.0   compiler_4.1.0  htmltools_0.5.2
#> [21] knitr_1.36

^{Created on 2021-11-01 by the reprex package (v2.0.1)}

The text was updated successfully, but these errors were encountered:

jimhester · 2021-11-09T13:50:21Z

Thank you for opening the issue and for supplying a reproducible example, it is a big help!

This should be fixed in the next released version of vroom.

# vroom 1.5.7 * Jenny Bryan is now the official maintainer. * Fix uninitialized bool detected by CRAN's UBSAN check (tidyverse/vroom#386) * Fix buffer overflow when trying to parse an integer field that is over 64 characters long (tidyverse/readr#1326) * Fix subset indexing when indexes span a file boundary multiple times (#383) # vroom 1.5.6 * `vroom(col_select=)` now works if `col_names = FALSE` as intended (#381) * `vroom(n_max=)` now correctly handles cases when reading from a connection and the file does _not_ end with a newline (tidyverse/readr#1321) * `vroom()` no longer issues a spurious warning when the parsing needs * to be restarted due to the presence of embedded newlines * (tidyverse/readr#1313) Fix performance * issue when materializing subsetted vectors (#378) * `vroom_format()` now uses the same internal multi-threaded code as `vroom_write()`, improving its performance in most cases (#377) * `vroom_fwf()` no longer omits the last line if it does _not_ end with a newline (tidyverse/readr#1293) * Empty files or files with only a header line and no data no longer cause a crash if read with multiple files (tidyverse/readr#1297) * Files with a header but no contents, or a empty file if `col_names = FALSE` no longer cause a hang when `progress = TRUE` (tidyverse/readr#1297) * Commented lines with comments at the end of lines no longer hang R (tidyverse/readr#1309) * Comment lines containing unpaired quotes are no longer treated as unterminated quotations (tidyverse/readr#1307) * Values with only a `Inf` or `NaN` prefix but additional data afterwards, like `Inform` or no longer inappropriately guessed as doubles (tidyverse/readr#1319) * Time types now support `%h` format to denote hour durations greater than 24, like readr (tidyverse/readr#1312) * Fix performance issue when materializing subsetted vectors (#378) # vroom 1.5.5 * `vroom()` now supports files with only carriage return newlines (`\r`). (#360, tidyverse/readr#1236) * `vroom()` now parses single digit datetimes more consistently as readr has done (tidyverse/readr#1276) * `vroom()` now parses `Inf` values as doubles (tidyverse/readr#1283) * `vroom()` now parses `NaN` values as doubles (tidyverse/readr#1277) * `VROOM_CONNECTION_SIZE` is now parsed as a double, which supports scientific notation (#364) * `vroom()` now works around specifying a `\n` as the delimiter (#365, tidyverse/dplyr#5977) * `vroom()` no longer crashes if given a `col_name` and `col_type` both less than the number of columns (tidyverse/readr#1271) * `vroom()` no longer hangs if given an empty value for `locale(grouping_mark=)` (tidyverse/readr#1241) * Fix performance regression when guessing with large numbers of rows (tidyverse/readr#1267)

jimhester added the bug an unexpected problem or unintended behavior label Nov 9, 2021

jimhester closed this as completed in tidyverse/vroom@723b006 Nov 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when specifying n_max argument, the last line of the CSV file is not imported. #1321

when specifying n_max argument, the last line of the CSV file is not imported. #1321

hidekoji commented Nov 1, 2021

jimhester commented Nov 9, 2021

when specifying n_max argument, the last line of the CSV file is not imported. #1321

when specifying n_max argument, the last line of the CSV file is not imported. #1321

Comments

hidekoji commented Nov 1, 2021

jimhester commented Nov 9, 2021