Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R session crashes depending on the size of guess_max #365

Closed
DavisVaughan opened this issue Aug 16, 2021 · 0 comments
Closed

R session crashes depending on the size of guess_max #365

DavisVaughan opened this issue Aug 16, 2021 · 0 comments

Comments

@DavisVaughan
Copy link
Member

Extracted from tidyverse/dplyr#5977

For some reason, the first call to vroom() doesn't crash, but the second does (with a larger guess_max). I'm not sure why this user is splitting by \n since the file uses ; delimiters, but I think they just wanted it to be read in as a single column of text data that they would post process manually.

I assume the problem is not really with guess_max, but somehow vroom is hitting something between lines 101-1000 that it can't handle for some reason.

library(vroom)

url <- "https://www.data.gouv.fr/fr/datasets/r/936f6d38-5969-46e5-8b9d-c7646d6390ec"
tf <- tempfile(fileext = ".txt")
download.file(url, tf)

# no crash - even on repeated calls
df <- vroom(
  file = tf, 
  delim = "\n", 
  col_names = FALSE, 
  locale = locale(encoding = "ISO-8859-1")
)

# crashes pretty reliably
df <- vroom(
  file = tf, 
  delim = "\n", 
  col_names = FALSE, 
  locale = locale(encoding = "ISO-8859-1"),
  guess_max = 1000L
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant