Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDE/ R session crashed with read_delim/mutate #5977

Closed
rg4555 opened this issue Aug 10, 2021 · 6 comments
Closed

IDE/ R session crashed with read_delim/mutate #5977

rg4555 opened this issue Aug 10, 2021 · 6 comments
Labels
reprex needs a minimal reproducible example

Comments

@rg4555
Copy link

rg4555 commented Aug 10, 2021

Hello,

Quite a few months ago I wrote this code for importing a csv file that worked fine at the time :

library(tidyverse)
###################################################################################### p 1000 t1 #################
# read each row of text individually so we can parse out the information manually
election0 <- 
  read_delim(
    "~/MASTER_1/stage/travail/R/donnees/2014/2014_t1+1000.txt", 
    "\n",
    col_names = FALSE,locale = locale(encoding = "ISO-8859-1")) %>%
  setNames("line_text") %>%
  mutate(
    # split by delimiter
    split_text  = strsplit(line_text, ";"),
    # assume the first 17 elements are common
    split_df    = map(split_text, ~.[1:17]),
    # and everything past this is repeating 11
    split_names = map(split_text, ~.[-c(1:17)]),
    columns     = map_dbl(split_text, length),
    # the number of repeating 11 name data elements
    n_names     = (columns - 17)/11) 

But when I run this same code on the same data as before my IDE crash (I think) I get a dialog box saying
R session aborded, R encountered a fatal error. The session was terminated :
image

Here is my session info :

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252   
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
[5] LC_TIME=French_France.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.7     purrr_0.3.4     readr_2.0.0    
[6] tidyr_1.1.3     tibble_3.1.3    ggplot2_3.3.5   tidyverse_1.3.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7       cellranger_1.1.0 pillar_1.6.2     compiler_4.1.0   dbplyr_2.1.1    
 [6] tools_4.1.0      lubridate_1.7.10 jsonlite_1.7.2   lifecycle_1.0.0  gtable_0.3.0    
[11] pkgconfig_2.0.3  rlang_0.4.11     reprex_2.0.1     cli_3.0.1        rstudioapi_0.13 
[16] DBI_1.1.1        haven_2.4.3      xml2_1.3.2       withr_2.4.2      httr_1.4.2      
[21] fs_1.5.0         generics_0.1.0   vctrs_0.3.8      hms_1.1.0        grid_4.1.0      
[26] tidyselect_1.1.1 glue_1.4.2       R6_2.5.0         fansi_0.5.0      readxl_1.3.1    
[31] tzdb_0.1.2       modelr_0.1.8     magrittr_2.0.1   backports_1.2.1  scales_1.1.1    
[36] ellipsis_0.3.2   rvest_1.0.1      assertthat_0.2.1 colorspace_2.0-2 utf8_1.2.2      
[41] stringi_1.7.3    munsell_0.5.0    broom_0.7.9      crayon_1.4.1    

And an extract of the logfile :

09 Aug 2021 07:03:02 [rsession-zugat] ERROR system error 10053 (Une connexion établie a été abandonnée par un logiciel de votre ordinateur hôte) [request-uri: /events/get_events]; OCCURRED AT void __cdecl rstudio::session::HttpConnectionImpl<class rstudio_boost::asio::ip::tcp>::sendResponse(const class rstudio::core::http::Response &) src/cpp/session/http/SessionWin32HttpConnectionListener.cpp:113; LOGGED FROM: void __cdecl rstudio::session::HttpConnectionImpl<class rstudio_boost::asio::ip::tcp>::sendResponse(const class rstudio::core::http::Response &) src/cpp/session/http/SessionWin32HttpConnectionListener.cpp:118
09 Aug 2021 07:18:18 [rsession-zugat] ERROR system error 5 (Accès refusé); OCCURRED AT auto __cdecl rstudio::core::system::ChildProcess::terminate::<lambda_b34d56978c1a268cda78ea8a24bc0d35>::operator ()(void) const src/cpp/core/system/Win32ChildProcess.cpp:287; LOGGED FROM: void __cdecl rstudio::core::system::ProcessSupervisor::terminateAll(void) src/cpp/core/system/Process.cpp:363

It's the first time I see the second error since I'm trying to execute the code so i don't think that one is causing trouble but I don't know.

Also with the help of rstudio community, it seems that it's the read_delim function that caused issue since we were able to import the file ussing read_lines.

Note : the data and code has no change since the time it was working fine.

plus a link to the thread : https://community.rstudio.com/t/ide-r-session-crashed-with-mutate/112330

@romainfrancois
Copy link
Member

I'm not sure there is much we can do at the dplyr level. Can you try to un-pipe the example so that at least you know where it happens.

Did you try outside of the IDE ? Perhaps it is an IDE issue ?

Unfortunately, without a usable reprex, there isn't much more we can do to help, e.g. we don't have "~/MASTER_1/stage/travail/R/donnees/2014/2014_t1+1000.txt"

@romainfrancois romainfrancois added the reprex needs a minimal reproducible example label Aug 16, 2021
@rg4555
Copy link
Author

rg4555 commented Aug 16, 2021

So the first things I did was to un-pipe it and it worked fine until the mutate segment.
But I don't think that the mutate function is the cause of it since when i'm scouting the data with view() it crash the session as well. also it worked using read_lines() instead.

I tried with rgui but it crashed too, besides it says r session crashed and not the IDE. I think that if the IDE crashed it wouldn't prompt anything.

unfortunately i can't provide a usefull reprex since the r session crashes and reprex() isn't working with a code that run into a crash.

the data are available here on the french government site :
https://www.data.gouv.fr/fr/datasets/elections-municipales-2014-resultats-1er-tour/

@DavisVaughan
Copy link
Member

This seems like a potential readr/vroom bug (maybe ALTREP related). What happens if you try readr::read_delim(lazy = FALSE) when reading the file?

@rg4555
Copy link
Author

rg4555 commented Aug 16, 2021

Still crashing...

@DavisVaughan
Copy link
Member

I can reproduce this as a vroom bug, and opened an issue over there. It seems to be related to guess_max

@jimhester
Copy link
Contributor

The bottom line is that '\n' is not a valid delimiter to use in read_delim(). For CSV and other delimited files newlines are always record delimiters, never field delimiters.

This worked by accident in older version of readr, but it was never the intended usage.

I have added a workaround to avoid this problem in vroom in the future.

However if your intent is to just read the lines into a vector I would suggest using readr::read_lines() rather than read_delim().

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue May 1, 2022
# vroom 1.5.7

* Jenny Bryan is now the official maintainer.

* Fix uninitialized bool detected by CRAN's UBSAN check
  (tidyverse/vroom#386)

* Fix buffer overflow when trying to parse an integer field that is
  over 64 characters long
  (tidyverse/readr#1326)

* Fix subset indexing when indexes span a file boundary multiple times
  (#383)

# vroom 1.5.6

* `vroom(col_select=)` now works if `col_names = FALSE` as intended (#381)

* `vroom(n_max=)` now correctly handles cases when reading from a
  connection and the file does _not_ end with a newline
  (tidyverse/readr#1321)

* `vroom()` no longer issues a spurious warning when the parsing needs
* to be restarted due to the presence of embedded newlines
* (tidyverse/readr#1313) Fix performance
* issue when materializing subsetted vectors (#378)

* `vroom_format()` now uses the same internal multi-threaded code as
  `vroom_write()`, improving its performance in most cases (#377)

* `vroom_fwf()` no longer omits the last line if it does _not_ end
  with a newline (tidyverse/readr#1293)

* Empty files or files with only a header line and no data no longer
  cause a crash if read with multiple files
  (tidyverse/readr#1297)

* Files with a header but no contents, or a empty file if `col_names =
  FALSE` no longer cause a hang when `progress = TRUE`
  (tidyverse/readr#1297)

* Commented lines with comments at the end of lines no longer hang R
  (tidyverse/readr#1309)

* Comment lines containing unpaired quotes are no longer treated as
  unterminated quotations
  (tidyverse/readr#1307)

* Values with only a `Inf` or `NaN` prefix but additional data
  afterwards, like `Inform` or no longer inappropriately guessed as
  doubles (tidyverse/readr#1319)

* Time types now support `%h` format to denote hour durations greater
  than 24, like readr (tidyverse/readr#1312)

* Fix performance issue when materializing subsetted vectors (#378)


# vroom 1.5.5

* `vroom()` now supports files with only carriage return newlines
  (`\r`). (#360, tidyverse/readr#1236)

* `vroom()` now parses single digit datetimes more consistently as
  readr has done (tidyverse/readr#1276)

* `vroom()` now parses `Inf` values as doubles
  (tidyverse/readr#1283)

* `vroom()` now parses `NaN` values as doubles
  (tidyverse/readr#1277)

* `VROOM_CONNECTION_SIZE` is now parsed as a double, which supports
  scientific notation (#364)

* `vroom()` now works around specifying a `\n` as the delimiter (#365,
  tidyverse/dplyr#5977)

* `vroom()` no longer crashes if given a `col_name` and `col_type`
  both less than the number of columns
  (tidyverse/readr#1271)

* `vroom()` no longer hangs if given an empty value for
  `locale(grouping_mark=)`
  (tidyverse/readr#1241)

* Fix performance regression when guessing with large numbers of rows
  (tidyverse/readr#1267)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reprex needs a minimal reproducible example
Projects
None yet
Development

No branches or pull requests

4 participants